no code implementations • 1 Apr 2024 • Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig
Specifically, we propose TraveLER, a model that can create a plan to "Traverse" through the video, ask questions about individual frames to "Locate" and store key information, and then "Evaluate" if there is enough information to answer the question.