Skip to main content

Ke-Han Lu


Google Scholar GitHub X

I am Ke-Han Lu, a second-year Ph.D. student at National Taiwan University, working under the supervision of Prof. Hung-Yi Lee. My research interests lie in the field of Multimodal Language Models, with a particular focus on cross-modal alignment and leveraging powerful language models to enhance multi-modal systems.

Publications

  1. Dynamic-superb phase-2: A collaboratively expanding benchmark for measuring the capabilities of spoken language models with 180 tasks
    ICLR 2025Paper
    Chien-yu HuangWei-Chih ChenShu-wen YangAndy T LiuChen-An LiYu-Xiang LinWei-Cheng TsengAnuj DiwanYi-Jen ShihJiatong ShiWilliam ChenXuanjun ChenChi-Yuan HsiaoPuyuan PengShih-Heng WangChun-Yi KuanKe-Han LuKai-Wei ChangChih-Kai YangFabian Ritter-GutierrezMing To ChuangKuan-Po HuangSiddhant AroraYou-Kuan LinEunjung YeoKalvin ChangChung-Ming ChienKwanghee ChoiCheng-Hsiu HsiehYi-Cheng LinChee-En YuI ChiuHeitor R GuimarãesJionghao HanTzu-Quan LinTzu-Yuan LinHomu ChangTing-Wu ChangChun Wei ChenShou-Jen ChenYu-Hua ChenHsi-Chun ChengKunal DhawanJia-Lin FangShi-Xin FangKuan-Yu Fang ChiangChi An FuHsien-Fu HsiaoChing Yu HsuShao-Syuan HuangLee Chen WeiHsi-Che LinHsuan-Hao LinHsuan-Ting LinJian-Ren LinTing-Chun LiuLi-Chun LuTsung-Min PaiAnkita PasadShih-Yun Shan KuanSuwon ShonYuxun TangYun-Shao TsaiJui-Chiang WeiTzu-Chieh WeiChengxi WuDien-Ruei WuChao-Han Huck YangChieh-Chi YangJia Qi YipShao-Xiang YuanVahid NorooziZhehuai ChenHaibin WuKaren LivescuDavid HarwathShinji WatanabeHung-yi Lee
  2. Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
    ICASSP 2025PaperGitHub
    Ke-Han LuZhehuai ChenSzu-Wei FuChao-Han Huck YangJagadeesh BalamBoris GinsburgYu-Chiang Frank WangHung-yi Lee
  3. SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
    ICASSP 2025Paper
    Chien-yu HuangMin-Han ShihKe-Han LuChi-Yuan HsiaoHung-yi Lee
  4. Building a taiwanese mandarin spoken language model: A first attempt
    ArXiv preprintPaper
    Chih-Kai YangYu-Kuan FuChen-An LiYi-Cheng LinYu-Xiang LinWei-Chih ChenHo Lam ChungChun-Yi KuanWei-Ping HuangKe-Han Lu*Tzu-Quan LinHsiu-Hsuan WangEn-Pei HuChan-Jan HsuLiang-Hsuan TsengI ChiuUlin SangaXuanjun ChenPo-chun HsuShu-wen YangHung-yi Lee
  5. Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
    IEEE SLT 2024Paper
    Chun-Yi KuanChih-Kai YangWei-Ping HuangKe-Han LuHung-yi Lee
  6. DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
    InterSpeech 2024Paper
    Ke-Han LuZhehuai ChenSzu-Wei FuHe HuangBoris GinsburgYu-Chiang Frank WangHung-yi Lee
  7. HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
    InterSpeech 2024PaperGitHub
    Yi-Wei WangKe-Han LuKuan-Yu Chen
  8. Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
    ICASSP 2024PaperGitHub
    Chien-yu HuangKe-Han LuShih-Heng WangChi-Yuan HsiaoChun-Yi KuanHaibin WuSiddhant AroraKai-Wei ChangJiatong ShiYifan PengRoshan SharmaShinji WatanabeBhiksha RamakrishnanShady ShehataHung-yi Lee
  9. Investigating zero-shot generalizability on mandarin-english code-switched asr and speech-to-text translation of recent foundation models with self-supervision and weak supervision
    ICASSP workshop 2024Paper
    Chih-Kai YangKuan-Po HuangKe-Han LuChun-Yi KuanChi-Yuan HsiaoHung-yi Lee
  10. A Context-aware Knowledge Transferring Strategy for CTC-based ASR
    IEEE SLT 2022PaperGitHub
    Ke-Han LuKuan-Yu Chen
  11. Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech Recognition
    IEEE/ACM Transactions on Audio, Speech, and Language ProcessingPaper
    Fu-Hao YuKuan-Yu ChenKe-Han Lu
  12. A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
    Poster spotlight, VQA workshop, CVPR 2021PaperVideoLeaderBoard
    Ke-Han LuBo-Han FangKuan-Yu Chen
  13. ntust-nlp-2 at ROCLING-2021 Shared Task: BERT-based semantic analyzer with word-level information
    ROCLING 2021: Conference on Computational Linguistics and Speech ProcessingPaper
    Ke-Han LuKuan-Yu Chen
  14. A Preliminary Study of Formosa Speech Recognition Challenge 2020 – Taiwanese ASR
    International Journal of Computational Linguistics and Chinese Language ProcessingPaper
    Fu-Hao YuKe-Han LuYi-Wei WangWei-Zhe ChangWei-Kai HuangKuan-Yu Chen

Education

  • National Taiwan University
    • Ph.D. in Communication Engineering
      • Feb 2024 - Present
  • National Taiwan University of Science and Technology
    • M.S. in Computer Science and Information Engineering
      • Sep 2020 - Feb 2023
  • National Taiwan University of Science and Technology
    • B.S. in Computer Science and Information Engineering
      • Sep 2016 - Jun 2020

Award

  • NSTC Graduate Research Fellowship(NSTC-GRF)
  • 16th TaiwanTech Outstanding Youth Award

Skills

  • Programming: Python, PyTorch, Javascript, Latex
  • Software and tools: Linux, Docker, Git, NeMo, Megatron-LM, ESPNET, Huggingface Transformers, fairseq
  • Language: Mandarin(native), English(fluent)