2. add framwork suffix for models 3. change black max columns to 79 4. add tests 5. integrate vad, encrypt and refactor manifest, regentity, extended_path, audio, parallel utils 6. added ui utils for encrypted preview 7. wip marblenet model 8. added transformers based wav2vec2 inference 9. update readme and manifest 10. add deploy setup target |
||
|---|---|---|
| src/plume | ||
| tests/plume | ||
| .flake8 | ||
| .gitignore | ||
| LICENSE | ||
| MANIFEST.in | ||
| Notes.md | ||
| README.md | ||
| pyproject.toml | ||
| setup.py | ||
| tox.ini | ||
README.md
Plume ASR
Generates text from audio containing speech
Table of Contents
Prerequisites
# apt install libsndfile-dev ffmpeg
Features
- ASR using Jasper (from NemoToolkit )
- ASR using Wav2Vec2 (from fairseq )
Installation
To install the packages and its dependencies run.
python setup.py install
or with pip
pip install .[all]
The installation should work on Python 3.6 or newer. Untested on Python 2.7
Usage
Library
Jasper
from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Wav2Vec2
from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Command Line
$ plume
Pretrained Models
Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md