Malar Kannan
7472b6457d
handling non-pnr cases without parens in text data
2020-06-16 11:02:53 +05:30
Malar Kannan
120302aad3
added support for name/dates/cities call data extraction and more logs
2020-06-15 10:24:38 +05:30
Malar Kannan
a7a25e9b07
1. using dataname args for update/fill annotations
...
2. rename to dump_ui
2020-06-10 14:55:59 +05:30
Malar Kannan
6d149d282d
1. added a data extraction type argument
...
2. cleanup/refactor
2020-06-09 19:16:24 +05:30
Malar Kannan
8db1be0083
refactor validation process arguments and logging
2020-06-05 16:32:08 +05:30
Malar Kannan
bca227a7d7
1. removed the transcriber_pretrained/speller from utils
...
2. introduced get_mongo_coll to get the collection object directly from mongo uri
3. removed processing of correction entries to remove space/upper casing
2020-06-04 17:49:16 +05:30
Malar Kannan
e3a01169c2
skipping invalid data points
2020-06-02 17:21:30 +05:30
Malar Kannan
3a5ce069ab
parallelize data loading from remote
2020-05-29 12:14:14 +05:30
Malar Kannan
9f9cb62b60
show duration on validation of dataset
2020-05-28 11:35:31 +05:30
Malar Kannan
de21952349
1. refactored wav chunk processing method
...
2. renamed streamlit to validation_ui
2020-05-28 11:18:39 +05:30
Malar Kannan
d87369c8fe
don't load audio for annotation only ui and keep spoken as text for normal asr validation
2020-05-27 15:57:42 +05:30
Malar Kannan
41af0a87de
respect verbose flag
2020-05-27 15:54:16 +05:30
Malar Kannan
6f395af10d
fix skipping null audio and add more verbose logs
2020-05-27 15:49:58 +05:30
Malar Kannan
a38789d0c3
added option to disable plots during validation
2020-05-27 15:43:03 +05:30
Malar Kannan
7ff2db3e2e
cleanup rev recycle
2020-05-27 15:33:22 +05:30
Malar Kannan
1acf9e403c
1. added support for mono/dual channel rev transcripts
...
2. handle errors when extracting datapoints from rev meta data
3. added suport for annotation only task when dumping ui data
2020-05-27 15:19:25 +05:30
Malar Kannan
1f2bedc156
1. enabled silece stripping in chunks when recycling audio from asr logs
...
2. limit asr recycling to 1 min of start audio to get reliable alignments and ignoring agent channel
3. added rev recycler for generating asr dataset from rev transcripts and audio
4. update pydub dependency for silence stripping fn and removing threadpool hardcoded worker count
2020-05-27 14:22:44 +05:30
Malar Kannan
fca9c1aeb3
refactored module structure
2020-05-21 19:13:44 +05:30
Malar Kannan
2d5b720284
1. added utility command to export call logs
...
2. mongo conn accepts port
2020-05-21 10:43:26 +05:30
Malar Kannan
8e79bbb571
1. implement dataset augmentation and validation in process
...
2. added option to skip 'incorrect' annotations in validation data
3. added confirmation on clearing mongo collection
4. added an option to navigate to a given text in the validation ui
5. added a dataset and remote option to trainer to load dataset from directory and remote rpyc service
2020-05-20 11:16:22 +05:30
Malar Kannan
83db445a6f
1. added training utils with custom data loaders with remote rpyc dataservice support
...
2. fix validation correction dump path
3. cache dataset for precaching before training to memory
4. update dependencies
2020-05-14 15:39:44 +05:30
Malar Kannan
d4aef4088d
1. clean-up unused data process code
...
2. fix invalid sample no from mongo
3. data loader service return remote netref
2020-05-13 14:02:46 +05:30
Malar Kannan
fdccea6b23
unlink temporary files after transcribing
2020-05-12 23:38:31 +05:30
Malar Kannan
c06a0814b9
1. added a tool to extract asr data from gcp transcripts logs
...
2. implement a funciton to export all call logs in a mongodb to a caller-id based yaml file
3. clean-up leaderboard duration logic
4. added a wip dataloader service
5. made the asr_data_writer util more generic with verbose flags and unique filename
6. added extendedpath util class with json support and mongo_conn function to connect to a mongo node
7. refactored the validation post processing to dump a ui config for validation
8. included utility functions to correct, fill update and clear annotations from mongodb data
9. refactored the ui logic to be more generic for any asr data
10. updated setup.py dependencies to support the above features
2020-05-12 23:38:06 +05:30
Malar Kannan
a7da729c0b
add validation ui and post processing to correct using validation data
2020-05-06 12:18:34 +05:30
Malar Kannan
aae03a6ae4
refresh to next entry on submit and comment out mongo clearing code for safety :P
2020-04-29 22:52:46 +05:30
Malar Kannan
4fd05a56d0
1. refactored streamlit code
...
2. fixed issues in data manifest handling
2020-04-29 17:22:45 +05:30
Malar Kannan
41074a1bca
1. added streamlit based validation ui with mongodb datastore integration
...
2. fix asr wrong sample rate inference
3. update requirements
2020-04-29 14:26:11 +05:30
Malar Kannan
61048f855e
implement call audio data recycler for asr
2020-04-27 10:53:14 +05:30
Malar Kannan
2c15b00da3
fix module packaging issue
2020-04-08 20:45:38 +05:30
Malar Kannan
d22a99a4f6
1. integrated data generator using google tts
...
2. added training script
2020-04-08 18:53:49 +05:30
Malar Kannan
f7ebd8e90a
refactored arg parsing to take server cli args
2020-03-27 15:55:56 +05:30
Malar Kannan
604d0bc87f
added rpyc server
2020-03-18 15:20:00 +05:30
Malar Kannan
4f4371c944
fixed wav header issue
2020-03-18 15:13:21 +05:30
Malar Kannan
880dd8bf6a
jasper asr first commit
2020-03-16 14:22:24 +05:30
Malar Kannan
7a320bb250
Initial commit
2020-03-16 14:21:51 +05:30