![Ke-Han Lu](/avatar.jpg) I’m Ke-Han Lu, a second-year Ph.D. student at National Taiwan University, advised by Prof. [Hung-Yi Lee](https://speech.ee.ntu.edu.tw/~hylee/index.php). My research focuses on multimodal language models, particularly on cross-modal alignment and utilizing large language models to enhance multimodal understanding. [![Google Scholar](https://img.shields.io/badge/Google%20Scholar-4285F4?logo=google-scholar&logoColor=white)](https://scholar.google.com/citations?user=YODHqGkAAAAJ)[![GitHub](https://img.shields.io/badge/kehanlu-181717?logo=github&logoColor=white) ](https://github.com/kehanlu)[![X](https://img.shields.io/badge/kehan_lu-000000?logo=x&logoColor=white) ](https://x.com/kehan_lu)[![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2?logo=linkedin&logoColor=white) ](https://www.linkedin.com/in/kehanlu)[![CV](https://img.shields.io/badge/Open%20to%20work-Check%20my%20CV-green)](/cv.pdf) - Lead author of the DeSTA series: [DeSTA2.5-Audio](https://arxiv.org/abs/2507.02768), [DeSTA-2](https://arxiv.org/abs/2409.20007), [DeSTA-1](https://arxiv.org/abs/2406.18871), [Speech-IFEval](https://arxiv.org/abs/2505.19037) - Core contributor to Dynamic-SUPERB ([Phase 1](https://arxiv.org/abs/2309.09510) & [Phase 2](https://arxiv.org/abs/2411.05361)) ## Selected Publications For the full publication list, please refer to my [Google Scholar](https://scholar.google.com/citations?user=YODHqGkAAAAJ) page. - Arxiv preprint How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2603.19195) - IEEE Transactions on Audio, Speech and Language Processing DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2507.02768) [\[GitHub\]](https://github.com/kehanlu/DeSTA2.5-Audio) [\[Huggingface\]](https://huggingface.co/collections/DeSTA-ntu/desta25-audio-686a6b9e71afd92e1dd87486) - InterSpeech 2025 Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models Ke-Han Lu, Chun-Yi Kuan, Hung-Yi Lee [\[Paper\]](https://arxiv.org/abs/2505.19037) [\[GitHub\]](https://github.com/kehanlu/Speech-IFEval) - ICLR 2025 Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Chien-Yu Huang et al. [\[Paper\]](https://openreview.net/forum?id=s7lzZpAW7T) [\[GitHub\]](https://github.com/dynamic-superb/dynamic-superb) - ICASSP 2025 Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2409.20007) [\[GitHub\]](https://github.com/kehanlu/DeSTA2/) - Technical Report Building a taiwanese mandarin spoken language model: A first attempt Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu\*, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2411.07111) - InterSpeech 2024 DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2406.18871) - ICASSP 2024 Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee [\[Paper\]](https://arxiv.org/abs/2309.09510) [\[GitHub\]](https://github.com/dynamic-superb/dynamic-superb) - IEEE SLT 2022 A Context-aware Knowledge Transferring Strategy for CTC-based ASR Ke-Han Lu, Kuan-Yu Chen [\[Paper\]](https://arxiv.org/abs/2210.06244) [\[GitHub\]](https://github.com/kehanlu/mandarin-wav2vec2) - IEEE/ACM Transactions on Audio, Speech, and Language Processing Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech Recognition Fu-Hao Yu, Kuan-Yu Chen, Ke-Han Lu [\[Paper\]](https://ieeexplore.ieee.org/document/9755057/) - Poster spotlight, VQA workshop, CVPR 2021 A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021 Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen [\[Paper\]](https://arxiv.org/abs/2106.13033) [\[Video\]](https://youtu.be/dYeS8T19ves) [\[LeaderBoard\]](https://visualqa.org/roe.html) ## Education - **Ph.D. student in Communication Engineering**, **National Taiwan University** - Feb 2024 - Present - Supervisor: Prof. Hung-Yi Lee - **M.S. in Computer Science and Information Engineering**, **National Taiwan University of Science and Technology** - **B.S. in Computer Science and Information Engineering**, **National Taiwan University of Science and Technology** - Sep 2016 - Jun 2020