Commit Graph

17 Commits (ae5586be7224d6bb283d8abfcf7b67110ca94ab7)

Author SHA1 Message Date
Malar Kannan ae5586be72 added evaluation command 2020-07-09 14:36:51 +05:30
Malar Kannan 069392d098 1. added a test generator and slu evaluator
2. ui dump now include gcp results
3. showing default option for more args validation process commands
2020-06-29 14:24:56 +05:30
Malar Kannan 515e9c1037 1. split extract all data types in one shot with --extraction-type all flag
2. add notes about diffing split extracted and original data
3. add a nlu conv generator to generate conv data based on nlu utterances and entities
4. add task uid support for dumping corrections
5. abstracted generate date fn
2020-06-25 11:03:09 +05:30
Malar Kannan 7dbb04dcbf 1. added conv data generator
2. more utils
2020-06-16 15:38:07 +05:30
Malar Kannan 3a5ce069ab parallelize data loading from remote 2020-05-29 12:14:14 +05:30
Malar Kannan 9f9cb62b60 show duration on validation of dataset 2020-05-28 11:35:31 +05:30
Malar Kannan 1f2bedc156 1. enabled silece stripping in chunks when recycling audio from asr logs
2. limit asr recycling to 1 min of start audio to get reliable alignments and ignoring agent channel
3. added rev recycler for generating asr dataset from rev transcripts and audio
4. update pydub dependency for silence stripping fn and removing threadpool hardcoded worker count
2020-05-27 14:22:44 +05:30
Malar Kannan fca9c1aeb3 refactored module structure 2020-05-21 19:13:44 +05:30
Malar Kannan 83db445a6f 1. added training utils with custom data loaders with remote rpyc dataservice support
2. fix validation correction dump path
3. cache dataset for precaching before training to memory
4. update dependencies
2020-05-14 15:39:44 +05:30
Malar Kannan c06a0814b9 1. added a tool to extract asr data from gcp transcripts logs
2. implement a funciton to export all call logs in a mongodb to a caller-id based yaml file
3. clean-up leaderboard duration logic
4. added a wip dataloader service
5. made the asr_data_writer util more generic with verbose flags and unique filename
6. added extendedpath util class with json support and mongo_conn function to connect to a mongo node
7. refactored the validation post processing to dump a ui config for validation
8. included utility functions to correct, fill update and clear annotations from mongodb data
9. refactored the ui logic to be more generic for any asr data
10. updated setup.py dependencies to support the above features
2020-05-12 23:38:06 +05:30
Malar Kannan 41074a1bca 1. added streamlit based validation ui with mongodb datastore integration
2. fix asr wrong sample rate inference
3. update requirements
2020-04-29 14:26:11 +05:30
Malar Kannan 61048f855e implement call audio data recycler for asr 2020-04-27 10:53:14 +05:30
Malar Kannan 2c15b00da3 fix module packaging issue 2020-04-08 20:45:38 +05:30
Malar Kannan d22a99a4f6 1. integrated data generator using google tts
2. added training script
2020-04-08 18:53:49 +05:30
Malar Kannan f7ebd8e90a refactored arg parsing to take server cli args 2020-03-27 15:55:56 +05:30
Malar Kannan 4f4371c944 fixed wav header issue 2020-03-18 15:13:21 +05:30
Malar Kannan 880dd8bf6a jasper asr first commit 2020-03-16 14:22:24 +05:30