Citation

How to cite Emotion-LLaMA in your research.

Main Paper (NeurIPS 2024)
MER2024 Challenge Paper
1. BibTeX
2. APA Style
MERR Dataset
Related Work
Acknowledgements
Using This Work
Star History
Contact for Citations
Next Steps

Main Paper (NeurIPS 2024)

If you find our work helpful for your research, please consider giving a star ⭐ and citing our paper:

BibTeX

@inproceedings{NEURIPS2024_c7f43ada,
  author = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
  pages = {110805--110853},
  publisher = {Curran Associates, Inc.},
  title = {Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning},
  url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/c7f43ada17acc234f568dc66da527418-Paper-Conference.pdf},
  volume = {37},
  year = {2024}
}

APA Style

Cheng, Z., Cheng, Z.-Q., He, J.-Y., Wang, K., Lin, Y., Lian, Z., Peng, X., & Hauptmann, A. (2024). Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning. In Advances in Neural Information Processing Systems (Vol. 37, pp. 110805-110853). Curran Associates, Inc.

IEEE Style

Z. Cheng et al., “Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning,” in Advances in Neural Information Processing Systems, vol. 37, pp. 110805-110853, 2024.

MER2024 Challenge Paper

If you use our Conv-Attention enhancement or reference our MER2024 challenge work:

BibTeX

@inproceedings{10.1145/3689092.3689404,
  author = {Cheng, Zebang and Tu, Shuyuan and Huang, Dawei and Li, Minghan and Peng, Xiaojiang and Cheng, Zhi-Qi and Hauptmann, Alexander G.},
  title = {SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition},
  year = {2024},
  isbn = {9798400712036},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3689092.3689404},
  doi = {10.1145/3689092.3689404},
  abstract = {This paper presents our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition. Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples, addressing the challenge of limited labeled data. To enhance multimodal fusion while mitigating modality-specific noise, we introduce Conv-Attention, a lightweight and efficient hybrid framework. Extensive experimentation validates the effectiveness of our approach. In the MER-NOISE track, our system achieves a state-of-the-art weighted average F-score of 85.30\%, surpassing the second and third-place teams by 1.47\% and 1.65\%, respectively. For the MER-OV track, our utilization of Emotion-LLaMA for open-vocabulary annotation yields an 8.52\% improvement in average accuracy and recall compared to GPT-4V, securing the highest score among all participating large multimodal models.},
  booktitle = {Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing},
  pages = {78–87},
  numpages = {10},
  keywords = {mer2024, noise robustness, open-vocabulary recognition},
  location = {Melbourne VIC, Australia},
  series = {MRAC '24}
}

APA Style

Cheng, Z., Tu, S., Huang, D., Li, M., Peng, X., Cheng, Z.-Q., & Hauptmann, A. G. (2024). SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition. In Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (pp. 78-87). Association for Computing Machinery.

MERR Dataset

If you use the MERR dataset in your research:

@inproceedings{NEURIPS2024_c7f43ada,
  author = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander},
  booktitle = {Advances in Neural Information Processing Systems},
  title = {Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning},
  year = {2024}
}

For the MER-Factory pipeline:

Visit MER-Factory GitHub
Check MER-Factory Documentation

MiniGPT-v2

Our work builds upon MiniGPT-v2. If you use the MiniGPT-v2 components:

@article{chen2023minigptv2,
  title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning},
  author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2310.09478},
  year={2023}
}

Website: MiniGPT-v2

AffectGPT

For emotion reasoning evaluation methodology:

@article{lian2023affectgpt,
  title={AffectGPT: Explainable Multimodal Emotion Recognition},
  author={Lian, Zheng and Sun, Licai and Sun, Haiyang and Chen, Kang and Wen, Zhuofan and Gu, Hao and Tao, Jinming and Niu, Mingyu and Liu, Bin and Tao, Jianhua},
  journal={arXiv preprint arXiv:2306.15401},
  year={2023}
}

Website: AffectGPT

LLaVA

For vision-language understanding:

@inproceedings{liu2023llava,
  title={Visual Instruction Tuning},
  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
  booktitle={NeurIPS},
  year={2023}
}

Website: LLaVA

Acknowledgements

We would like to acknowledge the following projects and datasets that made this work possible:

Models and Frameworks

LLaMA-2: Meta AI’s large language model
MiniGPT-v2: Vision-language multi-task learning framework
HuBERT: Audio feature extraction
EVA: Visual representation learning
MAE: Masked autoencoders for visual learning
VideoMAE: Video masked autoencoders

Datasets

MER2023: Multimodal Emotion Recognition Challenge 2023
MER2024: Multimodal Emotion Recognition Challenge 2024
EMER: Emotion reasoning evaluation dataset
DFEW: Dynamic Facial Expression in the Wild

Tools and Libraries

PyTorch: Deep learning framework
Hugging Face Transformers: NLP and multimodal models
Gradio: Demo interface
OpenFace: Facial action unit detection

Using This Work

For Academic Research

When citing this work in academic publications:

✅ Cite the main NeurIPS 2024 paper
✅ If using MERR dataset, acknowledge the dataset
✅ If using Conv-Attention, cite the MER2024 paper
✅ Acknowledge the base datasets (MER2023, etc.)

For Commercial Use

Please review the license for commercial use restrictions.

Attribution Example

This work uses Emotion-LLaMA (Cheng et al., NeurIPS 2024), a multimodal 
emotion recognition model with instruction tuning capabilities, trained 
on the MERR dataset for enhanced emotion reasoning.

Star History

Support our project by giving it a star on GitHub!

Contact for Citations

If you have questions about citing this work:

GitHub Issues: Ask a question
Email: Contact the corresponding author
Paper: Read the NeurIPS 2024 paper

Next Steps

Review the license information
Explore the main documentation
Visit our GitHub repository

Citation

Table of Contents

Main Paper (NeurIPS 2024)

BibTeX

APA Style

IEEE Style

MER2024 Challenge Paper

BibTeX

APA Style

MERR Dataset

Related Work

MiniGPT-v2

AffectGPT

LLaVA

Acknowledgements

Models and Frameworks

Datasets

Tools and Libraries

Using This Work

For Academic Research

For Commercial Use

Attribution Example

Star History

Contact for Citations

Next Steps