|
|
||
|---|---|---|
| inputs | ||
| .gitignore | ||
| CLI.md | ||
| README.md | ||
| TODO.md | ||
| generate_similar.py | ||
| requirements-linux.txt | ||
| requirements.txt | ||
| segment_data.py | ||
| segment_model.py | ||
| similarity.csv | ||
| speech_data.py | ||
| speech_model.py | ||
| speech_pitch.py | ||
| speech_samplegen.py | ||
| speech_segmentgen.py | ||
| speech_similar.py | ||
| speech_spectrum.py | ||
| speech_test.py | ||
| speech_testgen.py | ||
| speech_tools.py | ||
| voicerss_tts.py | ||
| voicerss_tts.py.bak | ||
README.md
Setup
. env/bin/activate to activate the virtualenv.
Data Generation
- update
OUTPUT_NAMEin speech_samplegen.py to create the dataset folder with the name python speech_samplegen.pygenerates variants of audio samples
Data Preprocessing
python speech_data.pycreates the training-testing data from the generated samples.- run
fix_csv(OUTPUT_NAME)once to create the fixed index of the dataset generated - run
generate_sppas_trans(OUTPUT_NAME)once to create the SPPAS transcription(wav+txt) data - run
$ (SPPAS_DIR)/bin/annotation.py -l eng -e csv --ipus --tok --phon --align --align -w ./outputs/OUTPUT_NAME/once to create the phoneme alignment csv files for all variants. create_seg_phonpair_tfrecords(OUTPUT_NAME)creates the tfrecords files with the phoneme level pairs of right/wrong stresses
Training
python speech_model.pytrains the model with the training data generated.train_siamese(OUTPUT_NAME)trains the siamese model with the generated dataset.
Testing
python speech_test.pytests the trained model with the test datasetevaluate_siamese(TEST_RECORD_FILE,audio_group=OUTPUT_NAME,weights = WEIGHTS_FILE_NAME)the TEST_RECORD_FILE will be under outputs directory and WEIGHTS_FILE_NAME will be under the models directory, pick the most recent weights file.