Video |
|||
MAMA |
The video captures the driver's actions and the car's movement in a dynamic and engaging manner, providing a comprehensive view of the driving experience. | The video shows a person preparing a meal in multiple steps, with various ingredients and utensils being used. | The video shows a step-by-step process of a car being built, starting with the initial design and ending with the final product. |
Video |
|||
MAMA |
The video captures a soccer game in progress, showing multiple players on the field, with the focus on the goalie and the soccer ball. | The video captures a basketball game in progress, with multiple players on the court and a crowd of spectators watching the game. | The video shows a person painting a room, with multiple shots of the process, including the initial preparation, the painting itself, and the final result. |
(a) State-of-the-art results on popular VideoQA and text-to-video retrieval (TVR) tasks. |
(b) Our MAMA framework can adaptively assign weights to the loss values of training samples. |
(c) The less popular the topic of a training sample is, the more improvement MAMA obtains. |
@article{nguyen2024meta,
author = {Nguyen, Thong and Bin, Yi and Wu, Xiaobao and Dong, Xinshuai and Hu, Zhiyuan and Le, Khoi and Nguyen, Cong-Duy and Ng, See-Kiong and Tuan, Luu Anh},
title = {Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning},
journal = {ECCV},
year = {2024},
}