no code implementations • 14 Jun 2024 • Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein
Training large vision-language models requires extensive, high-quality image-text pairs.
1 code implementation • CVPR 2024 • Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim
We present an approach to pose object recognition as next token prediction.
1 code implementation • ECCV 2020 • Kaiyu Yue, Jiangfan Deng, Feng Zhou
However, this introduces two problems: a) The adaptation module brings more parameters into training.
no code implementations • 23 Aug 2020 • Zhida Huang, Kaiyu Yue, Jiangfan Deng, Feng Zhou
Then we perform NMS only on visible bounding boxes to achieve the best fitting full box in inference.
2 code implementations • NeurIPS 2018 • Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu
The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.
no code implementations • ECCV 2018 • Chen Zhu, Xiao Tan, Feng Zhou, Xiao Liu, Kaiyu Yue, Errui Ding, Yi Ma
Specifically, it firstly summarizes the video by weight-summing all feature vectors in the feature maps of selected frames with a spatio-temporal soft attention, and then predicts which channels to suppress or to enhance according to this summary with a learned non-linear transform.
Ranked #12 on Action Recognition on ActivityNet