-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Although we encountered several unexpected difficulties (like the lack of computing resources and manpower) in the past few months, we are constantly maintaining this repo and trying to deliver some new stuff to the community. In this release (202410), we provide two new models:
Model | Egoschema | Perception-Test | MVBench | VideoMME | MSVC (Caption) | ActivityNet-QA |
---|---|---|---|---|---|---|
VideoLLaMA2-7B-16F | 51.7 | 51.4 | 54.6 | 47.9/50.3 | 2.53/2.59 | 50.2/3.3 |
VideoLLaMA2.1-7B-16F | 53.1 | 54.9 | 57.3 | 54.9/56.4 | 2.87/2.81 | 53.0/3.4 |
- VideoLLaMA2.1-7B-AV
- Trained from VideoLLaMA2.1-7B-16F
- Included more audio-visual joint training data (from AVInstruct) and more pure-text data
- Improved training recipes (e.g., we found that smaller batch sizes in audio-related training always give better results)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers