mirror of
https://github.com/malarinv/tacotron2
synced 2026-03-08 01:32:35 +00:00
151eef9466c228b1af6730d0ecef5169680c324c
Tacotron 2 (without wavenet)
Tacotron 2 PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.
This implementation includes distributed and fp16 support and uses the LJSpeech dataset.
Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's frameworks team.
Pre-requisites
- NVIDIA GPU + CUDA cuDNN
Setup
- Download and extract the LJ Speech dataset
- Clone this repo:
git clone https://github.com/NVIDIA/tacotron2.git - CD into this repo:
cd tacotron2 - Update .wav paths:
sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' *.txt - Install pytorch 0.4
- Install python requirements or use docker container (tbd)
- Install python requirements:
pip install requirements.txt - OR
- Docker container
(tbd)
- Install python requirements:
Training
python train.py --output_directory=outdir --log_directory=logdir- (OPTIONAL)
tensorboard --logdir=outdir/logdir
Multi-GPU (distributed) and FP16 Training
python -m multiproc train.py --output_directory=/outdir --log_directory=/logdir --hparams=distributed_run=True
Inference
jupyter notebook --ip=127.0.0.1 --port=31337- load inference.ipynb
Related repos
nv-wavenet: Faster than real-time wavenet inference
Acknowledgements
This implementation is inspired or uses code from the following repos: Ryuchi Yamamoto, Keith Ito, Prem Seetharaman.
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.
Description
Languages
Python
96.9%
Makefile
3.1%
