Emotion-LLaMA
Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Get Started View on GitHub Try Demo
π’ News
[2025.07.09] π₯ We release the MERR dataset construction strategy at MER-Factory!
[2024.09.27] π Our Emotion-LLaMA has been accepted at NeurIPS 2024!
[2024.09.07] π₯ We achieved third place in the MER-OV track of the MER2024 Challenge. Emotion-LLaMA is the highest-scoring model among all individual models.
[2024.07.10] π Building on Emotion-LLaMA, we won the championship in the MER-Noise track of the MER2024 Challenge.
[2024.06.12] π₯ We have deployed an online demo of Emotion-LLaMA on Hugging Face.
π Overview
Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions.
To address these issues, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications.
Additionally, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities.

β¨ Key Features
- Multimodal Integration: Seamlessly combines audio, visual, and textual inputs for comprehensive emotion understanding
- MERR Dataset: Comprehensive dataset with 28,618 coarse-grained and 4,487 fine-grained annotations
- State-of-the-art Performance: Achieves top scores on multiple benchmarks including EMER, MER2023, and DFEW
- Instruction Tuning: Enhanced emotional reasoning capabilities through multi-task instruction tuning
- Open Source: Fully available for research purposes with comprehensive documentation
π― Quick Links
π Performance Highlights
MER2023 Challenge
- F1 Score: 0.9036 (Audio, Visual, Text)
- Best performance among all methods
EMER Dataset
- Clue Overlap: 7.83 (Best among MLLMs)
- Label Overlap: 6.25 (Best among MLLMs)
MER2024 Challenge
- Championship in MER-Noise track
- 3rd Place in MER-OV track (highest individual model)
π¬ Research Paper
More details about Emotion-LLaMA are available in our NeurIPS 2024 paper.
If you find our work helpful for your research, please consider giving a star β and citing our paper:
@inproceedings{NEURIPS2024_c7f43ada,
author = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander},
booktitle = {Advances in Neural Information Processing Systems},
title = {Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning},
year = {2024}
}
π Contact
Feel free to contact us if you have any questions or suggestions!
- GitHub Issues: Report issues or ask questions
- Paper: Read our NeurIPS 2024 paper
- Demo: Try the online demo