updated README to include testing

Merge branch 'master' of /home/ilml/Public/Repos/speech_scoring
Added README.md describing the workflow
2017-12-29 16:21:38 +05:30 · 2017-12-29 13:15:51 +05:30 · 2017-12-29 13:14:37 +05:30
2 changed files with 30 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,23 @@
+### Setup
+`. env/bin/activate` to activate the virtualenv.
+
+### Data Generation
+* update `OUTPUT_NAME` in *speech_samplegen.py* to create the dataset folder with the name
+* `python speech_samplegen.py` generates variants of audio samples
+
+### Data Preprocessing
+* `python speech_data.py` creates the training-testing data from the generated samples.
+* run `fix_csv(OUTPUT_NAME)` once to create the fixed index of the dataset generated
+* run `generate_sppas_trans(OUTPUT_NAME)` once to create the SPPAS transcription(wav+txt) data
+* run `$ (SPPAS_DIR)/bin/annotation.py -l eng -e csv --ipus --tok --phon --align --align -w ./outputs/OUTPUT_NAME/` once to create the phoneme alignment csv files for all variants.
+* `create_seg_phonpair_tfrecords(OUTPUT_NAME)` creates the tfrecords files
+ with the phoneme level pairs of right/wrong stresses
+
+### Training
+* `python speech_model.py` trains the model with the training data generated.
+* `train_siamese(OUTPUT_NAME)` trains the siamese model with the generated dataset.
+
+### Testing
+* `python speech_test.py` tests the trained model with the test dataset
+* `evaluate_siamese(TEST_RECORD_FILE,audio_group=OUTPUT_NAME,weights = WEIGHTS_FILE_NAME)`
+  the TEST_RECORD_FILE will be under outputs directory and WEIGHTS_FILE_NAME will be under the models directory, pick the most recent weights file.
--- a/speech_samplegen.py
+++ b/speech_samplegen.py
@@ -216,6 +216,9 @@ def generate_audio_for_text_list(text_list):
    closer()

 def generate_audio_for_stories():
+    '''
+    Generates the audio sample variants for the list of words in the stories
+    '''
    # story_file = './inputs/all_stories_hs.json'
    story_file = './inputs/all_stories.json'
    stories_data = json.load(open(story_file))
@@ -225,6 +228,10 @@ def generate_audio_for_stories():
    generate_audio_for_text_list(text_list)

 def generate_test_audio_for_stories(sample_count=0):
+    '''
+    Picks a list of words from the wordlist that are not in story words
+    and generates the variants
+    '''
    story_file = './inputs/all_stories_hs.json'
    # story_file = './inputs/all_stories.json'
    stories_data = json.load(open(story_file))
Author	SHA1	Message	Date
Malar Kannan	225a720f18	updated README to include testing	2017-12-29 16:21:38 +05:30
Malar Kannan	b267b89a44	Merge branch 'master' of /home/ilml/Public/Repos/speech_scoring	2017-12-29 13:15:51 +05:30
Malar Kannan	eb10b577ae	Added README.md describing the workflow	2017-12-29 13:14:37 +05:30