Skip to main content

Ke-Han Lu


Google Scholar GitHub X

I’m Ke-Han Lu, a second-year Ph.D. student at National Taiwan University, advised by Prof. Hung-Yi Lee. My research focuses on multimodal language models, particularly on cross-modal alignment and utilizing large language models to enhance multimodal understanding.


Selected Publications

For the full publication list, please refer to my google scholar.

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Ke-Han Lu et al.
Arxiv preprintPaperGitHubHuggingfaceBibTeX
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
Ke-Han LuChun-Yi KuanHung-Yi Lee
InterSpeech 2025PaperGitHubBibTeX
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Chien-Yu Huang et al.
ICLR 2025PaperGitHubBibTeX
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han LuZhehuai ChenSzu-Wei FuChao-Han Huck YangJagadeesh BalamBoris GinsburgYu-Chiang Frank WangHung-yi Lee
ICASSP 2025PaperGitHubBibTeX
Building a taiwanese mandarin spoken language model: A first attempt
Chih-Kai YangYu-Kuan FuChen-An LiYi-Cheng LinYu-Xiang LinWei-Chih ChenHo Lam ChungChun-Yi KuanWei-Ping HuangKe-Han Lu*Tzu-Quan LinHsiu-Hsuan WangEn-Pei HuChan-Jan HsuLiang-Hsuan TsengI ChiuUlin SangaXuanjun ChenPo-chun HsuShu-wen YangHung-yi Lee
(Co-first author) Technical Report, ArXiv preprint.PaperBibTeX
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han LuZhehuai ChenSzu-Wei FuHe HuangBoris GinsburgYu-Chiang Frank WangHung-yi Lee
InterSpeech 2024PaperBibTeX
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu HuangKe-Han LuShih-Heng WangChi-Yuan HsiaoChun-Yi KuanHaibin WuSiddhant AroraKai-Wei ChangJiatong ShiYifan PengRoshan SharmaShinji WatanabeBhiksha RamakrishnanShady ShehataHung-yi Lee
ICASSP 2024PaperGitHubBibTeX
A Context-aware Knowledge Transferring Strategy for CTC-based ASR
Ke-Han LuKuan-Yu Chen
IEEE SLT 2022PaperGitHubBibTeX
Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech Recognition
Fu-Hao YuKuan-Yu ChenKe-Han Lu
IEEE/ACM Transactions on Audio, Speech, and Language ProcessingPaperBibTeX
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Ke-Han LuBo-Han FangKuan-Yu Chen
Poster spotlight, VQA workshop, CVPR 2021PaperVideoLeaderBoardBibTeX

Education

  • National Taiwan University
    • Ph.D. in Communication Engineering
      • Feb 2024 - Present
  • National Taiwan University of Science and Technology
    • M.S. in Computer Science and Information Engineering
      • Sep 2020 - Feb 2023
  • National Taiwan University of Science and Technology
    • B.S. in Computer Science and Information Engineering
      • Sep 2016 - Jun 2020

Honors

  • NVIDIA Academic Grant Program
  • NSTC Graduate Research Fellowship(NSTC-GRF)
  • 16th TaiwanTech Outstanding Youth Award

Skills

  • Programming: Python, PyTorch, Javascript, Latex
  • Software and tools: Linux, Docker, Git, NeMo, Megatron-LM, ESPNET, Huggingface Transformers, fairseq
  • Language: Mandarin(native), English(fluent)