Go to file
Malar Kannan 79aa5e8578 1. set flake8 max-line to 79
2. update streamlit dep to 1.0
3. add dev optional dep key
4. implement mono diarized dataset generation script
5. enable gpu support on asr transformers inference pipeline
6. use typer logging
7. clean-up annotation ui with everything other than asr-data keys as optional(including plots)
8. implement chunk_transcribe_meta_gen abstraction for asr chunking logic
9. make ui_persist compatibility change for streamlit 1.0
10. add diarize commands(bugfix)
11. add notebooks for diarization
2021-10-28 00:47:53 +05:30
notebooks 1. set flake8 max-line to 79 2021-10-28 00:47:53 +05:30
src/plume 1. set flake8 max-line to 79 2021-10-28 00:47:53 +05:30
tests/plume 1. include additional ui dependencies 2021-08-16 18:02:26 +05:30
.flake8 1. set flake8 max-line to 79 2021-10-28 00:47:53 +05:30
.gitignore 1. integrated data generator using google tts 2020-07-14 12:09:46 +05:30
LICENSE Initial commit 2020-03-16 14:21:51 +05:30
MANIFEST.in 1. refactor package root to src/ layout 2021-06-03 11:30:08 +05:30
Notes.md massive refactor/rename to plume 2021-02-23 19:43:33 +05:30
README.md 1. refactor package root to src/ layout 2021-06-03 11:30:08 +05:30
pyproject.toml 1. refactor package root to src/ layout 2021-06-03 11:30:08 +05:30
setup.py 1. set flake8 max-line to 79 2021-10-28 00:47:53 +05:30
tox.ini 1. refactor package root to src/ layout 2021-06-03 11:30:08 +05:30

README.md

Plume ASR

image

Generates text from audio containing speech


Table of Contents

Prerequisites

# apt install libsndfile-dev ffmpeg

Features

Installation

To install the packages and its dependencies run.

python setup.py install

or with pip

pip install .[all]

The installation should work on Python 3.6 or newer. Untested on Python 2.7

Usage

Library

Jasper

from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav

Wav2Vec2

from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav

Command Line

$ plume

Pretrained Models

Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md