plume-asr/README.md

60 lines
1.5 KiB
Markdown
Raw Permalink Normal View History

2021-02-23 14:13:33 +00:00
# Plume ASR
2020-03-16 08:50:54 +00:00
[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
2021-02-23 14:13:33 +00:00
> Generates text from audio containing speech
2020-03-16 08:50:54 +00:00
---
# Table of Contents
* [Prerequisites](#prerequisites)
2020-03-16 08:50:54 +00:00
* [Features](#features)
* [Installation](#installation)
* [Usage](#usage)
# Prerequisites
```bash
# apt install libsndfile-dev ffmpeg
```
2020-03-16 08:50:54 +00:00
# Features
* ASR using Jasper (from [NemoToolkit](https://github.com/NVIDIA/NeMo) )
2021-02-23 14:13:33 +00:00
* ASR using Wav2Vec2 (from [fairseq](https://github.com/pytorch/fairseq) )
2020-03-16 08:50:54 +00:00
# Installation
To install the packages and its dependencies run.
```bash
python setup.py install
```
or with pip
```bash
2021-02-23 14:13:33 +00:00
pip install .[all]
2020-03-16 08:50:54 +00:00
```
The installation should work on Python 3.6 or newer. Untested on Python 2.7
# Usage
2021-02-23 14:13:33 +00:00
### Library
> Jasper
2020-03-16 08:50:54 +00:00
```python
from plume.models.jasper_nemo.asr import JasperASR
2020-03-16 08:50:54 +00:00
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
```
2021-02-23 14:13:33 +00:00
> Wav2Vec2
```python
from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
```
### Command Line
```
$ plume
```
2021-02-26 05:27:23 +00:00
### Pretrained Models
**Jasper**
https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3
**Wav2Vec2**
https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md