speech-scoring/TODO.md

12 lines
793 B
Markdown
Raw Permalink Normal View History

0. generate the samples of phoneme similarity variants.
1. create spectrograms of 150ms windows with 50ms overlap for each word.
2. train a rnn to output a vector using the spectrograms
3. train a nn to output True/False based on the acceptability of the rnn output. -> Siamese network(implementation detail)
4. validate with real world samples
2017-10-26 12:36:14 +00:00
same word spoken by multiple people etc. will be low distance. two words which are very different (you can use similarity measure given in the speech_recognition repo) will have high distance.
the one with wrong pronunciation will have medium distance from one with right pronunciation
i also had good experience with getting non-English voices to speak out the English words to get "wrong" pronunciation - so that will be subtly different too.