a7a25e9b07
1. using dataname args for update/fill annotations
...
2. rename to dump_ui
2020-06-10 14:55:59 +05:30
6d149d282d
1. added a data extraction type argument
...
2. cleanup/refactor
2020-06-09 19:16:24 +05:30
8db1be0083
refactor validation process arguments and logging
2020-06-05 16:32:08 +05:30
bca227a7d7
1. removed the transcriber_pretrained/speller from utils
...
2. introduced get_mongo_coll to get the collection object directly from mongo uri
3. removed processing of correction entries to remove space/upper casing
2020-06-04 17:49:16 +05:30
e3a01169c2
skipping invalid data points
2020-06-02 17:21:30 +05:30
9f9cb62b60
show duration on validation of dataset
2020-05-28 11:35:31 +05:30
de21952349
1. refactored wav chunk processing method
...
2. renamed streamlit to validation_ui
2020-05-28 11:18:39 +05:30
d87369c8fe
don't load audio for annotation only ui and keep spoken as text for normal asr validation
2020-05-27 15:57:42 +05:30
41af0a87de
respect verbose flag
2020-05-27 15:54:16 +05:30
6f395af10d
fix skipping null audio and add more verbose logs
2020-05-27 15:49:58 +05:30
a38789d0c3
added option to disable plots during validation
2020-05-27 15:43:03 +05:30
7ff2db3e2e
cleanup rev recycle
2020-05-27 15:33:22 +05:30
1acf9e403c
1. added support for mono/dual channel rev transcripts
...
2. handle errors when extracting datapoints from rev meta data
3. added suport for annotation only task when dumping ui data
2020-05-27 15:19:25 +05:30
1f2bedc156
1. enabled silece stripping in chunks when recycling audio from asr logs
...
2. limit asr recycling to 1 min of start audio to get reliable alignments and ignoring agent channel
3. added rev recycler for generating asr dataset from rev transcripts and audio
4. update pydub dependency for silence stripping fn and removing threadpool hardcoded worker count
2020-05-27 14:22:44 +05:30
fca9c1aeb3
refactored module structure
2020-05-21 19:13:44 +05:30