2. update streamlit dep to 1.0 3. add dev optional dep key 4. implement mono diarized dataset generation script 5. enable gpu support on asr transformers inference pipeline 6. use typer logging 7. clean-up annotation ui with everything other than asr-data keys as optional(including plots) 8. implement chunk_transcribe_meta_gen abstraction for asr chunking logic 9. make ui_persist compatibility change for streamlit 1.0 10. add diarize commands(bugfix) 11. add notebooks for diarization |
||
|---|---|---|
| notebooks | ||
| src/plume | ||
| tests/plume | ||
| .flake8 | ||
| .gitignore | ||
| LICENSE | ||
| MANIFEST.in | ||
| Notes.md | ||
| README.md | ||
| pyproject.toml | ||
| setup.py | ||
| tox.ini | ||
README.md
Plume ASR
Generates text from audio containing speech
Table of Contents
Prerequisites
# apt install libsndfile-dev ffmpeg
Features
- ASR using Jasper (from NemoToolkit )
- ASR using Wav2Vec2 (from fairseq )
Installation
To install the packages and its dependencies run.
python setup.py install
or with pip
pip install .[all]
The installation should work on Python 3.6 or newer. Untested on Python 2.7
Usage
Library
Jasper
from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Wav2Vec2
from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Command Line
$ plume
Pretrained Models
Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md