Emotion-LLaMA

Multimodal Emotion Recognition and Reasoning with Instruction Tuning

📢 News

[2025.07.09] 🔥 We release the MERR dataset construction strategy at MER-Factory!

[2024.09.27] 🎉 Our Emotion-LLaMA has been accepted at NeurIPS 2024!

[2024.09.07] 🥉 We achieved third place in the MER-OV track of the MER2024 Challenge. Emotion-LLaMA is the highest-scoring model among all individual models.

[2024.07.10] 🏆 Building on Emotion-LLaMA, we won the championship in the MER-Noise track of the MER2024 Challenge.

[2024.06.12] 🔥 We have deployed an online demo of Emotion-LLaMA on Hugging Face.

🚀 Overview

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions.

To address these issues, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications.

Additionally, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities.

Emotion-LLaMA Framework

✨ Key Features

Multimodal Integration: Seamlessly combines audio, visual, and textual inputs for comprehensive emotion understanding
MERR Dataset: Comprehensive dataset with 28,618 coarse-grained and 4,487 fine-grained annotations
State-of-the-art Performance: Achieves top scores on multiple benchmarks including EMER, MER2023, and DFEW
Instruction Tuning: Enhanced emotional reasoning capabilities through multi-task instruction tuning
Open Source: Fully available for research purposes with comprehensive documentation

🎯 Quick Links

Getting Started

Set up Emotion-LLaMA in minutes

Installation Guide →

MERR Dataset

Explore our comprehensive emotion dataset

Dataset Overview →

Training

Train your own Emotion-LLaMA model

Training Guide →

Demo

Try Emotion-LLaMA online or locally

Demo Usage →

📊 Performance Highlights

MER2023 Challenge

F1 Score: 0.9036 (Audio, Visual, Text)
Best performance among all methods

EMER Dataset

Clue Overlap: 7.83 (Best among MLLMs)
Label Overlap: 6.25 (Best among MLLMs)

MER2024 Challenge

Championship in MER-Noise track
3rd Place in MER-OV track (highest individual model)

🔬 Research Paper

More details about Emotion-LLaMA are available in our NeurIPS 2024 paper.

If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper:

@inproceedings{NEURIPS2024_c7f43ada,
  author = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander},
  booktitle = {Advances in Neural Information Processing Systems},
  title = {Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning},
  year = {2024}
}

📞 Contact

Feel free to contact us if you have any questions or suggestions!

GitHub Issues: Report issues or ask questions
Paper: Read our NeurIPS 2024 paper
Demo: Try the online demo