12 lines
793 B
Markdown
12 lines
793 B
Markdown
0. generate the samples of phoneme similarity variants.
|
|
1. create spectrograms of 150ms windows with 50ms overlap for each word.
|
|
2. train a rnn to output a vector using the spectrograms
|
|
3. train a nn to output True/False based on the acceptability of the rnn output. -> Siamese network(implementation detail)
|
|
4. validate with real world samples
|
|
|
|
same word spoken by multiple people etc. will be low distance. two words which are very different (you can use similarity measure given in the speech_recognition repo) will have high distance.
|
|
|
|
the one with wrong pronunciation will have medium distance from one with right pronunciation
|
|
|
|
i also had good experience with getting non-English voices to speak out the English words to get "wrong" pronunciation - so that will be subtly different too.
|