Ke-Han Lu

I’m Ke-Han Lu, a second-year Ph.D. student at National Taiwan University, advised by Prof. Hung-Yi Lee. My research focuses on multimodal language models, particularly on cross-modal alignment and utilizing large language models to enhance multimodal understanding.

Lead author of the DeSTA series: DeSTA2.5-Audio, DeSTA-2, DeSTA-1, Speech-IFEval
Core contributor to Dynamic-SUPERB (Phase 1 & Phase 2)

Selected Publications

For the full publication list, please refer to my Google Scholar page.

Arxiv preprint

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Ke-Han Lu et al.

[Paper] [GitHub] [Huggingface]

@article{lu2025desta25audio,
  title={DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment},
  author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Huang, Sung-Feng and Yang, Chih-Kai and Yu, Chee-En and Chen, Chun-Wei and Chen, Wei-Chih and Huang, Chien-yu and others},
  journal={arXiv preprint arXiv:2507.02768},
  year={2025}
}

InterSpeech 2025

Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models

Ke-Han Lu, Chun-Yi Kuan, Hung-Yi Lee

[Paper] [GitHub]

@inproceedings{lu2025speechifeval,
  title={{Speech-IFEval}: Evaluating instruction-following and quantifying catastrophic forgetting in speech-aware language models},
  author={Lu, Ke-Han and Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle={Interspeech 2025},
  year={2025}
}

ICLR 2025

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Chien-Yu Huang et al.

[Paper] [GitHub]

@inproceedings{
  huang2025dynamicsuperb,
  title={Dynamic-{SUPERB} Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks},
  author={Chien-yu Huang and others},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=s7lzZpAW7T}
}

ICASSP 2025

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

[Paper] [GitHub]

@INPROCEEDINGS{Lu2025Developing,
  author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Balam, Jagadeesh and Ginsburg, Boris and Wang, Yu-Chiang Frank and Lee, Hung-Yi},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data}, 
  year={2025},
  pages={1-5},
  doi={10.1109/ICASSP49660.2025.10889444}
}

Technical Report
Building a taiwanese mandarin spoken language model: A first attempt

Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu*, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-yi Lee

[Paper]
```
@article{yang2024building,
  title={Building a taiwanese mandarin spoken language model: A first attempt},
  author={Yang, Chih-Kai and Fu, Yu-Kuan and Li, Chen-An and Lin, Yi-Cheng and Lin, Yu-Xiang and Chen, Wei-Chih and Chung, Ho Lam and Kuan, Chun-Yi and Huang, Wei-Ping and Lu, Ke-Han and others},
  journal={arXiv preprint arXiv:2411.07111},
  year={2024}
}
```

InterSpeech 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

[Paper]

@inproceedings{lu24c_interspeech,
  title     = {DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment},
  author    = {Ke-Han Lu and Zhehuai Chen and Szu-Wei Fu and He Huang and Boris Ginsburg and Yu-Chiang Frank Wang and Hung-yi Lee},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {4159--4163},
  doi       = {10.21437/Interspeech.2024-457}
}

ICASSP 2024

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

[Paper] [GitHub]

@inproceedings{huang2024dynamic,
  title={Dynamic-superb: Towards a dynamic, collaborative, and comprehensive instruction-tuning benchmark for speech},
  author={Huang, Chien-yu and Lu, Ke-Han and Wang, Shih-Heng and Hsiao, Chi-Yuan and Kuan, Chun-Yi and Wu, Haibin and Arora, Siddhant and Chang, Kai-Wei and Shi, Jiatong and Peng, Yifan and others},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={12136--12140},
  year={2024},
  organization={IEEE}
}

IEEE SLT 2022

A Context-aware Knowledge Transferring Strategy for CTC-based ASR

Ke-Han Lu, Kuan-Yu Chen

[Paper] [GitHub]

@inproceedings{lu2023context,
  title={A context-aware knowledge transferring strategy for CTC-based ASR},
  author={Lu, Ke-Han and Chen, Kuan-Yu},
  booktitle={2022 IEEE Spoken Language Technology Workshop (SLT)},
  pages={60--67},
  year={2023},
  organization={IEEE}
}

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech Recognition

Fu-Hao Yu, Kuan-Yu Chen, Ke-Han Lu

[Paper]

@article{yu2022non,
  title={Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition},
  author={Yu, Fu-Hao and Chen, Kuan-Yu and Lu, Ke-Han},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={30},
  pages={1474--1482},
  year={2022},
  publisher={IEEE}
}

Poster spotlight, VQA workshop, CVPR 2021

A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021

Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen

[Paper] [Video] [LeaderBoard]

@article{lu2021transformer,
  title={A transformer-based cross-modal fusion model with adversarial training for VQA challenge 2021},
  author={Lu, Ke-Han and Fang, Bo-Han and Chen, Kuan-Yu},
  journal={arXiv preprint arXiv:2106.13033},
  year={2021}
}

Education

Ph.D. student in Communication Engineering, National Taiwan University
- Feb 2024 - Present
- Supervisor: Prof. Hung-Yi Lee
M.S. in Computer Science and Information Engineering, National Taiwan University of Science and Technology
- Sep 2020 - Feb 2023
- Supervisor: Prof. Kuan-Yu Chen
B.S. in Computer Science and Information Engineering, National Taiwan University of Science and Technology
- Sep 2016 - Jun 2020

Honors

NVIDIA Academic Grant Program
NSTC Graduate Research Fellowship（NSTC-GRF）
16th TaiwanTech Outstanding Youth Award

Skills

Programming: Python, PyTorch, Javascript, Latex
Software and tools: Linux, Docker, Git, NeMo, Megatron-LM, ESPNET, Huggingface Transformers, fairseq
Language: Mandarin(native), English(fluent)