Ke-Han Lu
I’m Ke-Han Lu, a second-year Ph.D. student at National Taiwan University, advised by Prof. Hung-Yi Lee. My research focuses on multimodal language models, particularly on cross-modal alignment and utilizing large language models to enhance multimodal understanding.
- Lead author of the DeSTA series: DeSTA2.5-Audio, DeSTA-2, DeSTA-1, Speech-IFEval
- Core contributor to Dynamic-SUPERB (Phase 1 & Phase 2)
Selected Publications
For the full publication list, please refer to my Google Scholar page.
-
Arxiv preprint
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal AlignmentKe-Han Lu et al.
@article{lu2025desta25audio, title={DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment}, author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Huang, Sung-Feng and Yang, Chih-Kai and Yu, Chee-En and Chen, Chun-Wei and Chen, Wei-Chih and Huang, Chien-yu and others}, journal={arXiv preprint arXiv:2507.02768}, year={2025} } -
InterSpeech 2025
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language ModelsKe-Han Lu, Chun-Yi Kuan, Hung-Yi Lee
@inproceedings{lu2025speechifeval, title={{Speech-IFEval}: Evaluating instruction-following and quantifying catastrophic forgetting in speech-aware language models}, author={Lu, Ke-Han and Kuan, Chun-Yi and Lee, Hung-yi}, booktitle={Interspeech 2025}, year={2025} } -
ICLR 2025
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 TasksChien-Yu Huang et al.
@inproceedings{ huang2025dynamicsuperb, title={Dynamic-{SUPERB} Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks}, author={Chien-yu Huang and others}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=s7lzZpAW7T} } -
ICASSP 2025
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataKe-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee
@INPROCEEDINGS{Lu2025Developing, author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Balam, Jagadeesh and Ginsburg, Boris and Wang, Yu-Chiang Frank and Lee, Hung-Yi}, booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data}, year={2025}, pages={1-5}, doi={10.1109/ICASSP49660.2025.10889444} } -
Technical Report
Building a taiwanese mandarin spoken language model: A first attemptChih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu*, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-yi Lee
@article{yang2024building, title={Building a taiwanese mandarin spoken language model: A first attempt}, author={Yang, Chih-Kai and Fu, Yu-Kuan and Li, Chen-An and Lin, Yi-Cheng and Lin, Yu-Xiang and Chen, Wei-Chih and Chung, Ho Lam and Kuan, Chun-Yi and Huang, Wei-Ping and Lu, Ke-Han and others}, journal={arXiv preprint arXiv:2411.07111}, year={2024} } -
InterSpeech 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text AlignmentKe-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee
@inproceedings{lu24c_interspeech, title = {DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment}, author = {Ke-Han Lu and Zhehuai Chen and Szu-Wei Fu and He Huang and Boris Ginsburg and Yu-Chiang Frank Wang and Hung-yi Lee}, year = {2024}, booktitle = {Interspeech 2024}, pages = {4159--4163}, doi = {10.21437/Interspeech.2024-457} } -
ICASSP 2024
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechChien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee
@inproceedings{huang2024dynamic, title={Dynamic-superb: Towards a dynamic, collaborative, and comprehensive instruction-tuning benchmark for speech}, author={Huang, Chien-yu and Lu, Ke-Han and Wang, Shih-Heng and Hsiao, Chi-Yuan and Kuan, Chun-Yi and Wu, Haibin and Arora, Siddhant and Chang, Kai-Wei and Shi, Jiatong and Peng, Yifan and others}, booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={12136--12140}, year={2024}, organization={IEEE} } -
IEEE SLT 2022
A Context-aware Knowledge Transferring Strategy for CTC-based ASRKe-Han Lu, Kuan-Yu Chen
@inproceedings{lu2023context, title={A context-aware knowledge transferring strategy for CTC-based ASR}, author={Lu, Ke-Han and Chen, Kuan-Yu}, booktitle={2022 IEEE Spoken Language Technology Workshop (SLT)}, pages={60--67}, year={2023}, organization={IEEE} } -
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech RecognitionFu-Hao Yu, Kuan-Yu Chen, Ke-Han Lu
@article{yu2022non, title={Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition}, author={Yu, Fu-Hao and Chen, Kuan-Yu and Lu, Ke-Han}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={1474--1482}, year={2022}, publisher={IEEE} } -
Poster spotlight, VQA workshop, CVPR 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen
@article{lu2021transformer, title={A transformer-based cross-modal fusion model with adversarial training for VQA challenge 2021}, author={Lu, Ke-Han and Fang, Bo-Han and Chen, Kuan-Yu}, journal={arXiv preprint arXiv:2106.13033}, year={2021} }
Education
- Ph.D. student in Communication Engineering, National Taiwan University
- Feb 2024 - Present
- Supervisor: Prof. Hung-Yi Lee
- M.S. in Computer Science and Information Engineering, National Taiwan University of Science and Technology
- Sep 2020 - Feb 2023
- Supervisor: Prof. Kuan-Yu Chen
- B.S. in Computer Science and Information Engineering, National Taiwan University of Science and Technology
- Sep 2016 - Jun 2020
Honors
- NVIDIA Academic Grant Program
- NSTC Graduate Research Fellowship(NSTC-GRF)
- 16th TaiwanTech Outstanding Youth Award
Skills
- Programming: Python, PyTorch, Javascript, Latex
- Software and tools: Linux, Docker, Git, NeMo, Megatron-LM, ESPNET, Huggingface Transformers, fairseq
- Language: Mandarin(native), English(fluent)