Publication
AMEGO: Active Memory from long EGOcentric videos
Project
Interactive World Simulator
Learning to simulate the visual world from large-scale videos. In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single…
Tool
LLM2CLIP
LLM2CLIP is a novel approach that embraces the power of LLMs to unlock CLIP’s potential. By fine-tuning the LLM in the caption space with contrastive learning, we extract its textual capabilities into the output embeddings,…