Zhaowei Li

Master student @ Fudan University

lizhaowei126@gmail.com

 /   /   / 

# About Me

Hi! I am a final year M.S. student at Fudan University. Currently, I am interning at Bytedance.

My research interest focuses on Multi-Modal Large Language Models and Multi-Modal Agents.

I expect to graduate with a master's degree in June 2025. I'm open to academic collaboration opportunities. Please feel free to contact me by lizhaowei126@gmail.com if you are interested!

# News

  • [2024.11] We released TinyGrouningGPT, a lightweight large language model with fine-grained visual understanding ablity.

  • [2024.9] Our SpeechAlign is accepted to NeurIPS 2024!

  • [2024.8] We released UnifiedMLLM, a large language model that models multi-modal, multi-tasks in a unified representation.

  • [2024.5] We released QCRD, a general method to distilling contrastive rationale knowledge from LLMs into small language models.

  • [2024.5] Our GroundingGPT is accepted to ACL 2024! See you in Thailand!

  • [2024.4] We released SpeechAlign, the first to apply RLHF to align speech language models with human preferences!

  • [2024.1] We released GroundingGPT, the first end-to-edn multi-modal grounding model.

  • [2024.1] We released SpeechAgents, the first multi-modal multi-agent system.

#Research

(*: Equal contribution)

GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Zhaowei Li, Qi Xu, Dong Zhang, Hang Song, Yiqing Cai, Qi Qi, Ran Zhou, Junting Pan, Zefeng Li, Van Tu Vu, Zhida Huang, Tao Wang

[ACL 2024] [code ] [demo]

GroundingGPT is the first end-to-end large language model that supports multimodal grounding and understanding tasks.

SpeechAlign: Aligning Speech Generation to Human Preferences

Dong Zhang*, Zhaowei Li*, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

[NeurIPS 2024] [code ] [demo]

SpeechAlign is the first to applys RLHF to align speech language models with human preferences and proposes an effective iterative self-improvement strategy that converts weak speech language models to stronger ones.

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

Zhaowei Li, Wei Wang, YiQing Cai, Xu Qi, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang

[Preprint] [code ]

UnifiedMLLM is a large language model that models multi-modal, multi-tasks in a unified representation.

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models

Wei Wang*, Zhaowei Li*, Qi Xu, Linfeng Li, Yiqing Cai, Botian Jiang, Hang Song, Xincan Hu, Pengyu Wang, Li Xiao

[Preprint]

TinyGroundingGPT is an effective and efficient large language model with advanced fine-grained visual understanding ablity.

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng Qiu

[Preprint] [code ] [demo]

SpeechAgents is the first multi-modal multi-agent systems.

QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models

Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao

[Preprint]

QCRD is a general method to distilling contrastive rationale knowledge from LLMs into small language models.

#Full Publications

#2024

# Education

  • Fudan University Sept 2022 - Jun 2025
    M.S. in Electronic Engineering

  • Fudan University Sept 2018 - Jun 2022
    B.S. in Electronic Engineering

# Internship

  • Bytedance E-commerce Jul 2023 - Now
    Research on multi-modal large language model