
Unsupervised expert alignment and importance-guided layer chunking merge multiple fine-tuned experts into one generalist model.

Hello there, I'm Dengming Zhang, a master's student at the Zhejiang University.
My primary research focuses on Multimodal Large Models under low-data and low-compute constraints. On the low-data side, I study how to merge multiple domain-specialized, fine-tuned expert LLMs into one generalist model using only 1–5 samples, while retaining SOTA-level performance (ICLR 2026). On the low-compute side, I explore how to equip vision foundation models with audio using a single RTX 4090, and improve audio-visual affective understanding to a SOTA-level. I am also interested in Generative AI (Image/Music), Affective Computing, Meta-learning, and HCI.
By the way, I am good at combining scientific research with engineering implementation, and I have rich experience in front-end development, back-end development, and cluster devops. Some of the open source projects that I lead/participate in can be found on my GitHub.
Seeking PhD opportunities for Fall 2026
Grouped by research direction. Click the venue link in News to jump here.

Unsupervised expert alignment and importance-guided layer chunking merge multiple fine-tuned experts into one generalist model.

Audio-visual emotion understanding by teaching vision-language models to align sight and sound for artistic emotion.

Dual-scale attention meta-learning for personalized, dynamic music emotion recognition.

Discriminant space optimization improves few-shot bearing fault diagnosis with meta-learning.

Style-strength control and evaluation to improve style alignment in image creation.

Controllable text rendering with typography and style controls.

Generates music from video with multiple time-varying conditioning signals.

Decomposes spatial and temporal cues to improve controllable video-to-music generation.

Research internship on Model Merging (Expert Merging)[1].

Work on Game Character Material Generation with animation

Research internship on Model Merging (Expert Merging)[1].

Work on Game Character Material Generation with animation