A quick guide to Amazon's 40-plus papers on ICASSP

As usual at the International Conference on Acoustics, Speech and Signal Treatment (ICASSP), a plurality of Amazon’s accepted papers concentrates on automatic speech recognition – this year, a special emphasis on personal speech recognition. The subjects for detection of acoustic event, keyword spotlight and signal processing are also well represented.

But are also usual, some of the Amazon papers wander farther away, to topics such as CommonsSESSE-Reasoning, self-learning, rewriting queries and general machine learning techniques. Below is a quick guide to Amazon’s more than 40 articles at the conference.

Classification of acoustic event

Fedrpo: Federated Relaxed Pareto Optimization to Classification of Acoustic Event
Meng Feng, Chieh-Chi Kao, Qingming Tang, AMIT SOLOMON, VIKTOR ROZGIC, CHAO WANG

Multiscal audio spectrogram transformation to effective audio classification
Wentao Zhu, Mohamed Omar

Transform-Based BioAcustic Sound Event Detection on Few-Shot Learning Tasks
Liwen you, Erika Pelaez Coyotl, Suren Gunturu, Maarten van Segbroeck

Weight Sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Restrictions
Guan-Ting Lin, Qingming Tang, Chieh-Chi Kao, Viktor Rozgic, Chao Wang

Automatic speech recognition

Cross-a-azertance ASR Rescoring with graph-based label formation
Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Shally Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

Dynamic Chunk Caming for Total Streaming and Non-Streaming Conservation ASR
Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati

Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition
David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Gated contextual adapters to selective contextual bias in neural transducers
Anastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant Strimel, Feng-Ju (Claire) Chang, Ariya Rastrow, Nathan Susanj, Athanasios Mouchtaris

Mask The Bias: Improving Domain-adaptive Generalization of CTC-Based ASR with Internal Language Assessment
Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jason Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhhof

On-the-Fly Text-Henting to ASR adaptation to end to end
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko

Robust acoustic and semantic contextual bias of neural transducers to speech recognition
Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant Strimel, Ross McGowan, Athanasios Mouchtaris

Generation code

Conversation-text-to-sql: an odyssey to advanced and challenges ahead
Sree Hari Krishnan Parthasarathi, Lu Zeng, Dilek Hakkani-Tür

Common Reasoning

Clicker: Caution -based cross -language municipal knowledge transfer
Ruboline su, zhongkai sun, sixing lu, chengyuan ma, chenlei guo

Continuous learning

Quantification of disastrously forgetting in continuous federal learning
Christophe Dupuy, Jimit Majmudar, Jixuan Wang, Tanya Roosta, Rahul Gupta, Clement Chung, Jie Ding, Salman Avestimehr

ENDPOINT -Detection

Adaptive endpoint with deep contextual multi-armed bandits
Make June my, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

Toward accurate and real-time end-of-speech-stimat
Yifeng Fan, Colin Vaz, Di He, Jahn Heymann, Viet Anh Trinh, Zhe Zhang, Venkatesh Ravichandran

The spotting of the keyword

Transducers Neural Dual-Teth to Effective Wake Word-Spotting in Speech Recognition
Sauma Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree Mysore Sathyend, Anastasios AlexandRis, Grant Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann

Fasting point amount of attention training to the key word on a device
Sashank Macha, About Oza, Alex Escott, Francesco Caliva, Robbie Armitano, Santosh Kumar Cheekatmalla, Sree Hari Krishnan Parthasarathi, Yuzong Liu

Self -monitored Speech Representation Learning for Keyword Potting With Light Transformers
Chenyang Gao, Yue Gu, Francesco Caliva, Yuzong Liu

SMALL-FOOTPRINT SLIMMABLE NETWORKS TO KEY WORDSPOTS
Zuhaib Akhtar, Mohammad Omar Khursheed, Dongsu you, Yuzong Liu

Language learning

Phonetic rnn-transducer to wrong pronunciation
Daniel Zhang, Soumya Saha, Sarah Campbell

Machine learning

Plum then distill: Data set distillation with sampling
Anirudh Sundar, Gokce Keskin, Chander Chandak, I-Fan Chen, Pegah Ghahremani, Shalini Ghosh

The role of bias expressions in dot product caution
Mahdi Namazifar, Devamanyu Hazarica, Dilek Hakkani-Tür

Natural-language understanding

Distill-Quantize-Tune: Utilization of great teachers to low-foot-printing-effective multilingual NLU on the edge
Pegah Kharazmi, Zhewei Zhao, Clement Chung, Samridhi Choudhary

Pyramid -Dynamic inference: Encourage rapid inference via increased exit -boosting
Ershad Banijamali, Pegah Kharazmi, Sepehr Eghbali, Jixuan Wang, Clement Chung, Samridhi Choudhary

Personal speech recognition

Dialogue Act guided contextual adaptation to personal speech recognition
Feng-Ju (Claire) Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant Strimel, Ross McGowan

Procter: Pronunciation-Marking Contextual Adaptation to Personal Speech Recognition in Neural Transducers
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

Slot-triggered contextual bias to personal speech recognition neural transducers
Sibo Tong, Philip Harding, Simon Wiesler

Request about rewrite

KG-ECO: Knowledge graph enhanced correction of device to rewrite queries
Jason Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo

Self -learning

FEDERATED SELF-LEARNING WITH LIKE SUPPLY FOR CONTRAVING FOR NEque Recognition
Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Self -healing through error detection, allocation and retinating
Ansel Maclaughlin, Anna Rumshishisky, Rinat Khaziev, Anil Ramakrishna, Yuval Merhav, Rahul Gupta

Treatment signal

A framework for overall real-time personalized and non-personalized speech improvement
Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris EMARAGDIS

Robust self -monitored learning to recognition of human activity
Cong Xu, Yuhang Li, Dae Lee, Andrew Park, Hongda Mao, Huyen Do, Jonathan Chung, Dinesh Nair

Generative modeling -based manifold learning for adaptive filtration instructions
Karim Helwani, Paris Maragdis, Michael M. Goodwin

Spade: Self -monitored pre -recovery to acoustic disent.
John Harvill, Jarred Barber, Arun Nair, Ramin Pishehvar

Spoke linguistic understanding

End-to-end spoke language understanding using common CTC losses and self-monitored, livestic acoustic coders
Jixuan Wang, Martin Radfar, Kai Wei, Clement Chung

Exploring the subgroup performance in end-to-end speech models
Alkis Koudounas, Eliana Pastor, Giuseppe Attanasio, Vittorio Mazzia, Manuel Giolo, Thomas Gudre, Luca Cagliero, Luca de Alfaro, Elena Baralis, Daniele Amberti

Multilingual end-to-end spoken language understanding for ultra-low footprint applications
Markus Mueller, Anastasios Alexandridis, Zach Trozenski, Joel Whiteman, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann

Text-to-speech

Frames Works: High -speed opponents vocoder in time domain with very low calculation complexity
Ahmed Mustafa, Jean-Marc Valin, Jan Buethe, Paris EMARAGDIS, MIKE GOODWIN

Modeling accents with low resource without accent specific TTS frontend
Georgi Tinchev, Marta Czarnowska, Kamil already

Video

Forms: Modality conservation embedding for audio video sync using transformers
Akash Gupta, Rohun Tripathi, Wondong Jang

Multiscal composition restrictions for representation of representation on videos
Georgios Paraskvopoulos, Chandrashekhar Lavania, Lovish Chum, Shiva Sunday

Communication voice

Lav-bit Redundan Code of Voice using a Speed-Torrorition-Optimized Variation in AutoCoder
Jean-Marc Valin, Jan Buethe, Ahmed Mustafa

A quick guide to Amazon’s 40-plus papers on ICASSP

Generation code

Conversation-text-to-sql: an odyssey to advanced and challenges ahead
Sree Hari Krishnan Parthasarathi, Lu Zeng, Dilek Hakkani-Tür

Common Reasoning

Clicker: Caution -based cross -language municipal knowledge transfer
Ruboline su, zhongkai sun, sixing lu, chengyuan ma, chenlei guo

Continuous learning

Quantification of disastrously forgetting in continuous federal learning
Christophe Dupuy, Jimit Majmudar, Jixuan Wang, Tanya Roosta, Rahul Gupta, Clement Chung, Jie Ding, Salman Avestimehr

Language learning

Phonetic rnn-transducer to wrong pronunciation
Daniel Zhang, Soumya Saha, Sarah Campbell

Machine learning

Plum then distill: Data set distillation with sampling
Anirudh Sundar, Gokce Keskin, Chander Chandak, I-Fan Chen, Pegah Ghahremani, Shalini Ghosh

The role of bias expressions in dot product caution
Mahdi Namazifar, Devamanyu Hazarica, Dilek Hakkani-Tür

Request about rewrite

KG-ECO: Knowledge graph enhanced correction of device to rewrite queries
Jason Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo

Video

Forms: Modality conservation embedding for audio video sync using transformers
Akash Gupta, Rohun Tripathi, Wondong Jang

Multiscal composition restrictions for representation of representation on videos
Georgios Paraskvopoulos, Chandrashekhar Lavania, Lovish Chum, Shiva Sunday

Communication voice

Lav-bit Redundan Code of Voice using a Speed-Torrorition-Optimized Variation in AutoCoder
Jean-Marc Valin, Jan Buethe, Ahmed Mustafa

Leave a Comment Cancel reply

Generation code Conversation-text-to-sql: an odyssey to advanced and challenges aheadSree Hari Krishnan Parthasarathi, Lu Zeng, Dilek Hakkani-Tür

Common Reasoning Clicker: Caution -based cross -language municipal knowledge transferRuboline su, zhongkai sun, sixing lu, chengyuan ma, chenlei guo

Continuous learning Quantification of disastrously forgetting in continuous federal learningChristophe Dupuy, Jimit Majmudar, Jixuan Wang, Tanya Roosta, Rahul Gupta, Clement Chung, Jie Ding, Salman Avestimehr

Language learning Phonetic rnn-transducer to wrong pronunciationDaniel Zhang, Soumya Saha, Sarah Campbell

Machine learning Plum then distill: Data set distillation with samplingAnirudh Sundar, Gokce Keskin, Chander Chandak, I-Fan Chen, Pegah Ghahremani, Shalini Ghosh The role of bias expressions in dot product cautionMahdi Namazifar, Devamanyu Hazarica, Dilek Hakkani-Tür

Request about rewrite KG-ECO: Knowledge graph enhanced correction of device to rewrite queriesJason Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo

Video Forms: Modality conservation embedding for audio video sync using transformersAkash Gupta, Rohun Tripathi, Wondong Jang Multiscal composition restrictions for representation of representation on videosGeorgios Paraskvopoulos, Chandrashekhar Lavania, Lovish Chum, Shiva Sunday

Communication voice Lav-bit Redundan Code of Voice using a Speed-Torrorition-Optimized Variation in AutoCoderJean-Marc Valin, Jan Buethe, Ahmed Mustafa

Leave a Comment Cancel reply

Generation code

Conversation-text-to-sql: an odyssey to advanced and challenges ahead
Sree Hari Krishnan Parthasarathi, Lu Zeng, Dilek Hakkani-Tür

Common Reasoning

Clicker: Caution -based cross -language municipal knowledge transfer
Ruboline su, zhongkai sun, sixing lu, chengyuan ma, chenlei guo

Continuous learning

Quantification of disastrously forgetting in continuous federal learning
Christophe Dupuy, Jimit Majmudar, Jixuan Wang, Tanya Roosta, Rahul Gupta, Clement Chung, Jie Ding, Salman Avestimehr

Language learning

Phonetic rnn-transducer to wrong pronunciation
Daniel Zhang, Soumya Saha, Sarah Campbell

Machine learning

Plum then distill: Data set distillation with sampling
Anirudh Sundar, Gokce Keskin, Chander Chandak, I-Fan Chen, Pegah Ghahremani, Shalini Ghosh

The role of bias expressions in dot product caution
Mahdi Namazifar, Devamanyu Hazarica, Dilek Hakkani-Tür

Request about rewrite

KG-ECO: Knowledge graph enhanced correction of device to rewrite queries
Jason Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo

Video

Forms: Modality conservation embedding for audio video sync using transformers
Akash Gupta, Rohun Tripathi, Wondong Jang

Multiscal composition restrictions for representation of representation on videos
Georgios Paraskvopoulos, Chandrashekhar Lavania, Lovish Chum, Shiva Sunday

Communication voice

Lav-bit Redundan Code of Voice using a Speed-Torrorition-Optimized Variation in AutoCoder
Jean-Marc Valin, Jan Buethe, Ahmed Mustafa