Getting Started
This guide will help you set up Emotion-LLaMA on your system and get started with multimodal emotion recognition.
Table of Contents
- System Requirements
- Quick Installation
- Preparing Pre-trained Models
- Project Structure
- Verification
- Next Steps
- Troubleshooting
System Requirements
Hardware Requirements
- GPU: NVIDIA GPU with at least 24GB VRAM (for training)
- RAM: 32GB or more recommended
- Storage: At least 50GB free space for models and datasets
Software Requirements
- Operating System: Linux (Ubuntu 18.04+), Windows 10/11, or macOS
- Python: 3.8 or later
- CUDA: 11.0 or later (for GPU acceleration)
- Conda: Anaconda or Miniconda
Quick Installation
1. Clone the Repository
git clone https://github.com/ZebangCheng/Emotion-LLaMA.git
cd Emotion-LLaMA
2. Create Conda Environment
conda env create -f environment.yaml
conda activate llama
The environment setup may take several minutes depending on your internet connection.
3. Install Additional Dependencies
pip install moviepy==1.0.3
pip install soundfile==0.12.1
pip install opencv-python==4.7.0.72
Preparing Pre-trained Models
Llama-2-7b-chat-hf
Download the Llama-2-7b-chat-hf model from Hugging Face:
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
Save the model to Emotion-LLaMA/checkpoints/Llama-2-7b-chat-hf/
Configure Model Path
Specify the path to Llama-2 in the model config file (minigpt4/configs/models/minigpt_v2.yaml):
# Set Llama-2-7b-chat-hf path
llama_model: "/path/to/Emotion-LLaMA/checkpoints/Llama-2-7b-chat-hf"
MiniGPT-v2 Checkpoint
Specify the path to MiniGPT-v2 in the training config file (train_configs/Emotion-LLaMA_finetune.yaml):
# Set MiniGPT-v2 path
ckpt: "/path/to/Emotion-LLaMA/checkpoints/minigptv2_checkpoint.pth"
Replace
/path/to/with your actual installation path.
HuBERT-large Model (For Demo)
Download the HuBERT-large model from Hugging Face:
https://huggingface.co/TencentGameMate/chinese-hubert-large
Save to Emotion-LLaMA/checkpoints/transformer/chinese-hubert-large/
Specify the path in minigpt4/conversation/conversation.py:
# Set HuBERT-large model path
model_file = "checkpoints/transformer/chinese-hubert-large"
Project Structure
The complete project structure including datasets and checkpoints:
π¦ Dataset (External)
βββ π¦ Emotion
βββ π MER2023
βββ π video # Raw video files
βββ π HL-UTT # HuBERT features
βββ π mae_340_UTT # MAE features
βββ π maeV_399_UTT # VideoMAE features
βββ π transcription_en_all.csv # Video transcriptions
βββ π MERR_coarse_grained.txt # 28,618 coarse-grained annotations
βββ π MERR_coarse_grained.json
βββ π MERR_fine_grained.txt # 4,487 fine-grained annotations
βββ π MERR_fine_grained.json
π¦ Emotion-LLaMA (Project Root)
βββ π checkpoints/ # Pre-trained models
β βββ π Llama-2-7b-chat-hf/ # Base LLaMA model
β βββ π save_checkpoint/ # Trained checkpoints
β β βββ π stage2/
β β β βββ checkpoint_best.pth # Best model from stage 2
β β β βββ log.txt # Training logs
β β βββ Emoation_LLaMA.pth # Demo model
β βββ π transformer/
β β βββ π chinese-hubert-large/ # Audio encoder
β βββ minigptv2_checkpoint.pth # MiniGPT-v2 base
βββ π minigpt4/ # Core model implementation
β βββ π configs/ # Model and dataset configs
β βββ π datasets/ # Dataset loaders
β βββ π models/ # Model architectures
β βββ π processors/ # Data processors
β βββ π conversation/ # Conversation templates
βββ π train_configs/ # Training configurations
β βββ Emotion-LLaMA_finetune.yaml # Stage 1 config
β βββ minigptv2_tuning_stage_2.yaml # Stage 2 config
βββ π eval_configs/ # Evaluation configurations
β βββ demo.yaml # Demo config
β βββ eval_emotion.yaml # MER2023 eval config
β βββ eval_emotion_EMER.yaml # EMER eval config
βββ π examples/ # Example video clips
βββ π images/ # Documentation images
βββ π docs/ # Documentation site (this site!)
βββ π train.py # Training script
βββ π eval_emotion.py # Evaluation script (MER2023)
βββ π eval_emotion_EMER.py # Evaluation script (EMER)
βββ π app.py # Gradio demo
βββ π app_EmotionLlamaClient.py # API client
βββ π environment.yml # Conda environment
βββ π requirements.txt # Python dependencies
Key Directories
Checkpoints: Store all pre-trained models and trained checkpoints
- Download Llama-2-7b-chat-hf from Hugging Face
- Save trained models from training runs
Dataset: External dataset directory (apply for access)
- Must be organized as shown above
- Features should be pre-extracted for faster training
Documentation: This documentation site
- Built with Jekyll and Just the Docs theme
- Deployed to GitHub Pages
Verification
To verify your installation, you can run a quick test:
python -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available())"
Expected output:
PyTorch version: 2.x.x
CUDA available: True
Next Steps
- Try the Demo - Run Emotion-LLaMA online or locally
- Explore the Dataset - Learn about the MERR dataset
- Train Your Model - Train Emotion-LLaMA from scratch
- Run Evaluations - Evaluate model performance
Troubleshooting
Common Issues
Issue: CUDA out of memory error
- Solution: Reduce batch size in configuration files or use gradient accumulation
Issue: Missing dependencies
- Solution: Reinstall the conda environment:
conda env remove -n llama && conda env create -f environment.yaml
Issue: Model download fails
- Solution: Check your internet connection and Hugging Face access permissions
For more help, please open an issue on GitHub.