2021-02-23 14:13:33 +00:00
|
|
|
# Plume ASR
|
2020-03-16 08:50:54 +00:00
|
|
|
|
|
|
|
|
[](https://github.com/python/black)
|
|
|
|
|
|
2021-02-23 14:13:33 +00:00
|
|
|
> Generates text from audio containing speech
|
2020-03-16 08:50:54 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
# Table of Contents
|
|
|
|
|
|
2020-08-06 17:10:14 +00:00
|
|
|
* [Prerequisites](#prerequisites)
|
2020-03-16 08:50:54 +00:00
|
|
|
* [Features](#features)
|
|
|
|
|
* [Installation](#installation)
|
|
|
|
|
* [Usage](#usage)
|
|
|
|
|
|
2020-08-06 17:10:14 +00:00
|
|
|
# Prerequisites
|
|
|
|
|
```bash
|
|
|
|
|
# apt install libsndfile-dev ffmpeg
|
|
|
|
|
```
|
|
|
|
|
|
2020-03-16 08:50:54 +00:00
|
|
|
# Features
|
|
|
|
|
|
|
|
|
|
* ASR using Jasper (from [NemoToolkit](https://github.com/NVIDIA/NeMo) )
|
2021-02-23 14:13:33 +00:00
|
|
|
* ASR using Wav2Vec2 (from [fairseq](https://github.com/pytorch/fairseq) )
|
2020-03-16 08:50:54 +00:00
|
|
|
|
|
|
|
|
# Installation
|
|
|
|
|
To install the packages and its dependencies run.
|
|
|
|
|
```bash
|
|
|
|
|
python setup.py install
|
|
|
|
|
```
|
|
|
|
|
or with pip
|
|
|
|
|
```bash
|
2021-02-23 14:13:33 +00:00
|
|
|
pip install .[all]
|
2020-03-16 08:50:54 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The installation should work on Python 3.6 or newer. Untested on Python 2.7
|
|
|
|
|
|
|
|
|
|
# Usage
|
2021-02-23 14:13:33 +00:00
|
|
|
### Library
|
|
|
|
|
> Jasper
|
2020-03-16 08:50:54 +00:00
|
|
|
```python
|
2021-06-02 13:17:44 +00:00
|
|
|
from plume.models.jasper_nemo.asr import JasperASR
|
2020-03-16 08:50:54 +00:00
|
|
|
asr_model = JasperASR("/path/to/model_config_yaml","/path/to/encoder_checkpoint","/path/to/decoder_checkpoint") # Loads the models
|
|
|
|
|
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
|
|
|
|
|
```
|
2021-02-23 14:13:33 +00:00
|
|
|
> Wav2Vec2
|
|
|
|
|
```python
|
|
|
|
|
from plume.models.wav2vec2.asr import Wav2Vec2ASR
|
|
|
|
|
asr_model = Wav2Vec2ASR("/path/to/ctc_checkpoint","/path/to/w2v_checkpoint","/path/to/target_dictionary") # Loads the models
|
|
|
|
|
TEXT = asr_model.transcribe(wav_data) # Returns the text spoken in the wav
|
|
|
|
|
```
|
|
|
|
|
### Command Line
|
|
|
|
|
```
|
|
|
|
|
$ plume
|
|
|
|
|
```
|
2021-02-26 05:27:23 +00:00
|
|
|
### Pretrained Models
|
|
|
|
|
**Jasper**
|
|
|
|
|
https://ngc.nvidia.com/catalog/models/nvidia:multidataset_jasper10x5dr/files?version=3
|
|
|
|
|
**Wav2Vec2**
|
|
|
|
|
https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
|