Research Interests
My primary research interest lies in audio understanding and generation. I aim to build intelligent audio systems that can:
(1) Understand: deeply perceive and reason about complex acoustic environments with reliability and efficiency;
(2) Generate: synthesize high-fidelity and controllable audio that aligns with human intent;
(3) Interact: engage in natural and seamless interactions to assist humans in everyday scenarios.
I'm currently seeking PhD / industrial opportunities for 2027!
|
News
[April 2026] FineLAP, MeanAudio, and SAC have been accepted by ACL 2026.
[January 2026] TinyMU has been accepted by ICASSP 2026.
[November 2025] I start my internship at ByteDance Seed.
[September 2025] MMAR has been accepted by NeurIPS 2025.
[July 2025] I start my internship at Tencent Hunyuan.
[May 2025] Two papers have been accepted by ACL 2025.
[January 2025] DRCap has been accepted by ICASSP 2025.
|
|
Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
Xiquan Li, Junxi Liu, Wenxi Chen, Haina Zhu, Ziyang Ma, Xie Chen
arXiv, 2026
paper /
code /
demo
A flow-matching-based text-to-audio model enhanced with online RL via GRPO.
|
|
FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining
Xiquan Li, Xuenan Xu, Ziyang Ma, Wenxi Chen, Haolin He, Qiuqiang Kong, Xie Chen
ACL, 2026 (Main)
paper /
code /
dataset
A contrastively pretrained audio-language model that excels at both clip- and frame-level audio understanding tasks.
|
|
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
Xiquan Li*, Junxi Liu*, Yuzhe Liang, Zhikang Niu, Wenxi Chen, Xie Chen
ACL, 2026 (Main)
paper /
code /
demo
A fast and faithful text-to-audio generator that incorporates MeanFlow for single-step synthesis.
|
|
TinyMU: A compact Audio-Language Model For Music Understanding
Xiquan Li, Aurian Quelennec, Slim Essid
ICASSP, 2026
paper /
code /
dataset
A compact audio-language model with strong music understanding and reasoning ability.
|
|
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen
ICASSP, 2025 (Oral)
paper /
code
A zero-shot audio captioning system with strong in-domain and cross-domain performance.
|
Shanghai Jiao Tong University, Shanghai, China
M.E. in Information Engineering, Sep. 2024 - Mar. 2027
|
|
Telecom Paris, Palaiseau, France
M.E. in Information Engineering, Sep. 2023 - Jun. 2026
|
|
Shanghai Jiao Tong University, Shanghai, China
B.E. in Information Engineering, Dual Degree in French, Sep. 2020 - Jun. 2024
|
|
Seed Speech Group, ByteDance, Shanghai, China
Intern, Nov. 2025 - Present
|
|
Hunyuan Team, Tencent, Shanghai, China
Intern, Jun. 2025 - Nov. 2025
|
|
ADASP Group, Telecom Paris, Palaiseau, France
Research Intern, Sep. 2024 - Jun. 2025
Advisor: Slim Essid
|
|
DSP Lab, The Chinese University of Hong Kong, Hong Kong, China
Research Assistant, Jun. 2024 - Sep. 2024
Advisor: Qiuqiang Kong
|
|
X-LANCE Lab, Shanghai Jiao Tong University, Shanghai, China
Research Intern, Jan. 2023 - Present
Advisor: Xie Chen
|
|
Misc
Apart from research, I enjoy skiing, playing soccer, and working out. Check out some of the wonderful ski moments here :)
|
|
Updated April 2026
Template adapted from Here
|
|