1
0
mirror of https://github.com/malarinv/plume-asr.git synced 2026-03-07 20:02:34 +00:00
Malar Kannan 79aa5e8578 1. set flake8 max-line to 79
2. update streamlit dep to 1.0
3. add dev optional dep key
4. implement mono diarized dataset generation script
5. enable gpu support on asr transformers inference pipeline
6. use typer logging
7. clean-up annotation ui with everything other than asr-data keys as optional(including plots)
8. implement chunk_transcribe_meta_gen abstraction for asr chunking logic
9. make ui_persist compatibility change for streamlit 1.0
10. add diarize commands(bugfix)
11. add notebooks for diarization
2021-10-28 00:47:53 +05:30
2021-10-28 00:47:53 +05:30
2021-10-28 00:47:53 +05:30
2021-10-28 00:47:53 +05:30
2020-03-16 14:21:51 +05:30
2021-02-23 19:43:33 +05:30
2021-10-28 00:47:53 +05:30

Plume ASR

image

Generates text from audio containing speech


Table of Contents

Prerequisites

# apt install libsndfile-dev ffmpeg

Features

Installation

To install the packages and its dependencies run.

python setup.py install

or with pip

pip install .[all]

The installation should work on Python 3.6 or newer. Untested on Python 2.7

Usage

Library

Jasper

from plume.models.jasper_nemo.asr import JasperASR
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav

Wav2Vec2

from plume.models.wav2vec2.asr import Wav2Vec2ASR
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav

Command Line

$ plume

Pretrained Models

Jasper https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3 Wav2Vec2 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md

Description
No description provided
Readme MIT 424 KiB
Languages
Jupyter Notebook 54.8%
Python 43.3%
HTML 1.9%