mirror of https://github.com/malarinv/tacotron2
adding readme and license
commit
1874b9a08f
|
|
@ -0,0 +1,25 @@
|
|||
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
|
||||
#
|
||||
# Redistribution and use in source and binary forms, with or without
|
||||
# modification, are permitted provided that the following conditions
|
||||
# are met:
|
||||
# * Redistributions of source code must retain the above copyright
|
||||
# notice, this list of conditions and the following disclaimer.
|
||||
# * Redistributions in binary form must reproduce the above copyright
|
||||
# notice, this list of conditions and the following disclaimer in the
|
||||
# documentation and/or other materials provided with the distribution.
|
||||
# * Neither the name of NVIDIA CORPORATION nor the names of its
|
||||
# contributors may be used to endorse or promote products derived
|
||||
# from this software without specific prior written permission.
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
|
||||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
|
||||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
|
||||
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
# Tacotron 2 (without wavenet)
|
||||
|
||||
Tacotron 2 PyTorch implementation of [Natural TTS Synthesis By Conditioning
|
||||
Wavenet On Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf).
|
||||
|
||||
This implementation includes **distributed** and **fp16** support
|
||||
and uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).
|
||||
|
||||
Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's
|
||||
frameworks team.
|
||||
|
||||

|
||||
|
||||
|
||||
## Pre-requisites
|
||||
1. NVIDIA GPU + CUDA cuDNN
|
||||
|
||||
## Setup
|
||||
1. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
|
||||
2. Clone this repo: `git clone https://github.com/NVIDIA/tacotron2.git`
|
||||
3. CD into this repo: `cd tacotron2`
|
||||
4. Update .wav paths: `sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' *.txt`
|
||||
5. Install [pytorch 0.4](https://github.com/pytorch/pytorch)
|
||||
6. Install python requirements or use docker container (tbd)
|
||||
- Install python requirements: `pip install requirements.txt`
|
||||
- **OR**
|
||||
- Docker container `(tbd)`
|
||||
|
||||
## Training
|
||||
1. `python train.py --output_directory=outdir --log_directory=logdir`
|
||||
2. (OPTIONAL) `tensorboard --logdir=outdir/logdir`
|
||||
|
||||
## Multi-GPU (distributed) and FP16 Training
|
||||
1. `python -m multiproc train.py --output_directory=/outdir --log_directory=/logdir --hparams=distributed_run=True`
|
||||
|
||||
## Inference
|
||||
1. `jupyter notebook --ip=127.0.0.1 --port=31337`
|
||||
2. load inference.ipynb
|
||||
|
||||
## Related repos
|
||||
[nv-wavenet](https://github.com/NVIDIA/nv-wavenet/): Faster than real-time
|
||||
wavenet inference
|
||||
|
||||
## Acknowledgements
|
||||
This implementation is inspired or uses code from the following repos:
|
||||
[Ryuchi Yamamoto](github.com/r9y9/tacotron_pytorch), [Keith
|
||||
Ito](https://github.com/keithito/tacotron/), [Prem Seetharaman](Prem
|
||||
Seetharaman's https://github.com/pseeth/pytorch-stft).
|
||||
|
||||
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen,
|
||||
Yuxuan Wang and Zongheng Yang.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue