jasper-asr

mirror of https://github.com/malarinv/jasper-asr.git synced 2026-03-08 10:32:35 +00:00

Author	SHA1	Message	Date
Malar Kannan	fa89775f86	1. add a new streamlit ui to preview manifest 2. implement rpcy transcription client for files	2020-08-07 12:00:33 +05:30
Malar Kannan	ae5586be72	added evaluation command	2020-07-09 14:36:51 +05:30
Malar Kannan	069392d098	1. added a test generator and slu evaluator 2. ui dump now include gcp results 3. showing default option for more args validation process commands	2020-06-29 14:24:56 +05:30
Malar Kannan	515e9c1037	1. split extract all data types in one shot with --extraction-type all flag 2. add notes about diffing split extracted and original data 3. add a nlu conv generator to generate conv data based on nlu utterances and entities 4. add task uid support for dumping corrections 5. abstracted generate date fn	2020-06-25 11:03:09 +05:30
Malar Kannan	e76ccda5dd	1. fix update-correction to use ui_dump instead of manifest 2. update training params no of checkpoints on chpk frequency	2020-06-19 14:16:04 +05:30
Malar Kannan	000853b600	1. added option to strip silent chunks 2. computing caller quality based on task-id of corrections	2020-06-17 21:42:20 +05:30
Malar Kannan	ac0e04c226	stripping silence on call chunk	2020-06-17 19:43:25 +05:30
Malar Kannan	62eefb9294	fix 11st to 11th in ordinal	2020-06-17 19:30:12 +05:30
Malar Kannan	8e238c254e	1. added start delay arg in call recycler 2. implement ui_dump/manifest writer in call_recycler itself 3. refactored call data point plotter 4. added sample-ui task-ui on the validation process 5. implemented call-quality stats using corrections from mongo 6. support deleting cursors on mongo 7. implement multiple task support on validation ui based on task_id mongo field	2020-06-17 19:11:15 +05:30
Malar Kannan	7dbb04dcbf	1. added conv data generator 2. more utils	2020-06-16 15:38:07 +05:30
Malar Kannan	7472b6457d	handling non-pnr cases without parens in text data	2020-06-16 11:02:53 +05:30
Malar Kannan	120302aad3	added support for name/dates/cities call data extraction and more logs	2020-06-15 10:24:38 +05:30
Malar Kannan	a7a25e9b07	1. using dataname args for update/fill annotations 2. rename to dump_ui	2020-06-10 14:55:59 +05:30
Malar Kannan	6d149d282d	1. added a data extraction type argument 2. cleanup/refactor	2020-06-09 19:16:24 +05:30
Malar Kannan	8db1be0083	refactor validation process arguments and logging	2020-06-05 16:32:08 +05:30
Malar Kannan	bca227a7d7	1. removed the transcriber_pretrained/speller from utils 2. introduced get_mongo_coll to get the collection object directly from mongo uri 3. removed processing of correction entries to remove space/upper casing	2020-06-04 17:49:16 +05:30
Malar Kannan	e3a01169c2	skipping invalid data points	2020-06-02 17:21:30 +05:30
Malar Kannan	3a5ce069ab	parallelize data loading from remote	2020-05-29 12:14:14 +05:30
Malar Kannan	9f9cb62b60	show duration on validation of dataset	2020-05-28 11:35:31 +05:30
Malar Kannan	de21952349	1. refactored wav chunk processing method 2. renamed streamlit to validation_ui	2020-05-28 11:18:39 +05:30
Malar Kannan	d87369c8fe	don't load audio for annotation only ui and keep spoken as text for normal asr validation	2020-05-27 15:57:42 +05:30
Malar Kannan	41af0a87de	respect verbose flag	2020-05-27 15:54:16 +05:30
Malar Kannan	6f395af10d	fix skipping null audio and add more verbose logs	2020-05-27 15:49:58 +05:30
Malar Kannan	a38789d0c3	added option to disable plots during validation	2020-05-27 15:43:03 +05:30
Malar Kannan	7ff2db3e2e	cleanup rev recycle	2020-05-27 15:33:22 +05:30
Malar Kannan	1acf9e403c	1. added support for mono/dual channel rev transcripts 2. handle errors when extracting datapoints from rev meta data 3. added suport for annotation only task when dumping ui data	2020-05-27 15:19:25 +05:30
Malar Kannan	1f2bedc156	1. enabled silece stripping in chunks when recycling audio from asr logs 2. limit asr recycling to 1 min of start audio to get reliable alignments and ignoring agent channel 3. added rev recycler for generating asr dataset from rev transcripts and audio 4. update pydub dependency for silence stripping fn and removing threadpool hardcoded worker count	2020-05-27 14:22:44 +05:30
Malar Kannan	fca9c1aeb3	refactored module structure	2020-05-21 19:13:44 +05:30
Malar Kannan	2d5b720284	1. added utility command to export call logs 2. mongo conn accepts port	2020-05-21 10:43:26 +05:30
Malar Kannan	8e79bbb571	1. implement dataset augmentation and validation in process 2. added option to skip 'incorrect' annotations in validation data 3. added confirmation on clearing mongo collection 4. added an option to navigate to a given text in the validation ui 5. added a dataset and remote option to trainer to load dataset from directory and remote rpyc service	2020-05-20 11:16:22 +05:30
Malar Kannan	83db445a6f	1. added training utils with custom data loaders with remote rpyc dataservice support 2. fix validation correction dump path 3. cache dataset for precaching before training to memory 4. update dependencies	2020-05-14 15:39:44 +05:30
Malar Kannan	d4aef4088d	1. clean-up unused data process code 2. fix invalid sample no from mongo 3. data loader service return remote netref	2020-05-13 14:02:46 +05:30
Malar Kannan	fdccea6b23	unlink temporary files after transcribing	2020-05-12 23:38:31 +05:30
Malar Kannan	c06a0814b9	1. added a tool to extract asr data from gcp transcripts logs 2. implement a funciton to export all call logs in a mongodb to a caller-id based yaml file 3. clean-up leaderboard duration logic 4. added a wip dataloader service 5. made the asr_data_writer util more generic with verbose flags and unique filename 6. added extendedpath util class with json support and mongo_conn function to connect to a mongo node 7. refactored the validation post processing to dump a ui config for validation 8. included utility functions to correct, fill update and clear annotations from mongodb data 9. refactored the ui logic to be more generic for any asr data 10. updated setup.py dependencies to support the above features	2020-05-12 23:38:06 +05:30
Malar Kannan	a7da729c0b	add validation ui and post processing to correct using validation data	2020-05-06 12:18:34 +05:30
Malar Kannan	aae03a6ae4	refresh to next entry on submit and comment out mongo clearing code for safety :P	2020-04-29 22:52:46 +05:30
Malar Kannan	4fd05a56d0	1. refactored streamlit code 2. fixed issues in data manifest handling	2020-04-29 17:22:45 +05:30
Malar Kannan	41074a1bca	1. added streamlit based validation ui with mongodb datastore integration 2. fix asr wrong sample rate inference 3. update requirements	2020-04-29 14:26:11 +05:30
Malar Kannan	61048f855e	implement call audio data recycler for asr	2020-04-27 10:53:14 +05:30
Malar Kannan	2c15b00da3	fix module packaging issue	2020-04-08 20:45:38 +05:30
Malar Kannan	d22a99a4f6	1. integrated data generator using google tts 2. added training script	2020-04-08 18:53:49 +05:30
Malar Kannan	f7ebd8e90a	refactored arg parsing to take server cli args	2020-03-27 15:55:56 +05:30
Malar Kannan	604d0bc87f	added rpyc server	2020-03-18 15:20:00 +05:30
Malar Kannan	4f4371c944	fixed wav header issue	2020-03-18 15:13:21 +05:30
Malar Kannan	880dd8bf6a	jasper asr first commit	2020-03-16 14:22:24 +05:30
Malar Kannan	7a320bb250	Initial commit	2020-03-16 14:21:51 +05:30

46 Commits