0. generate the samples of phoneme similarity variants. 1. create spectrograms of 150ms windows with 50ms overlap for each word. 2. train a rnn to output a vector using the spectrograms 3. train a nn to output True/False based on the acceptability of the rnn output. -> Siamese network(implementation detail)