no code implementations • 19 Jan 2024 • Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge
GCG learns multiple Gaussian functions to characterize the temporal structure of the video, and sample question-critical frames as positive moments to be the visual inputs of LMMs.
1 code implementation • ICCV 2023 • Yiwen Huang, Yixuan Sun, Chenghang Lai, Qing Xu, Xiaomei Wang, Xuli Shen, Weifeng Ge
Following the spirit of multiple instance learning (MIL), we decompose the weakly supervised correspondence learning problem into three stages: image-level matching, region-level matching, and pixel-level matching.