mirror of
https://github.com/malarinv/plume-asr.git
synced 2026-03-07 20:02:34 +00:00
e07c7c9caf83b507394cab6e4624e1957cfa8de8
2. add framwork suffix for models 3. change black max columns to 79 4. add tests 5. integrate vad, encrypt and refactor manifest, regentity, extended_path, audio, parallel utils 6. added ui utils for encrypted preview 7. wip marblenet model 8. added transformers based wav2vec2 inference 9. update readme and manifest 10. add deploy setup target
Plume ASR
Generates text from audio containing speech
Table of Contents
Prerequisites
# apt install libsndfile-dev ffmpeg
Features
- ASR using Jasper (from NemoToolkit )
- ASR using Wav2Vec2 (from fairseq )
Installation
To install the packages and its dependencies run.
python setup.py install
or with pip
pip install .[all]
The installation should work on Python 3.6 or newer. Untested on Python 2.7
Usage
Library
Jasper
from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Wav2Vec2
from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
Command Line
$ plume
Pretrained Models
Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Languages
Jupyter Notebook
54.8%
Python
43.3%
HTML
1.9%