Skip to main content

Research Experience

Instruction-following Speech Language Models I am currently working on developing instruction-following speech language models in cooperation with NVIDIA. We have proposed a scalable and robust framework called DeSTA [9, 12] for training these general-purpose speech systems. Additionally, I have co-authored papers related to evaluation benchmarks [7] and systems [10, 11] in this research direction. I have experience in fine-tuning large-scale language models using NeMo and Megatron-LM.

Automatic Speech Recognition I have focused on improving the recognition accuracy of non-autoregressive ASR systems by injecting linguistic knowledge from pre-trained language models through cross-modal alignment [4] and knowledge distillation [5]. I have experience in training ASR systems using ESPnet and pre-training Mandarin wav2vec2.0 with fairseq.