speech-scoring

generate the samples of phoneme similarity variants.
create spectrograms of 150ms windows with 50ms overlap for each word.
train a rnn to output a vector using the spectrograms
train a nn to output True/False based on the acceptability of the rnn output. -> Siamese network(implementation detail)
validate with real world samples