no code implementations • 5 Dec 2023 • Zilin Du, Haoxin Li, Xu Guo, Boyang Li
Comparing our method to direct training on synthetic data, we observed a significant improvement of 24. 06% F1 with synthetic text and 26. 42% F1 with synthetic images.
no code implementations • 14 Nov 2023 • Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith
Our results demonstrate the effectiveness of human-readable, natural-language IDs in generative retrieval with LMs.
1 code implementation • 11 Jan 2023 • Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith
We propose NarrowBERT, a modified transformer encoder that increases the throughput for masked language model pretraining by more than $2\times$.
1 code implementation • ICCV 2023 • Haoxin Li, YuAn Liu, Hanwang Zhang, Boyang Li
The video background is clearly a source of static bias, but the video foreground, such as the clothing of the actor, can also provide static bias.
no code implementations • CVPR 2021 • Jiaming Zhou, Kun-Yu Lin, Haoxin Li, Wei-Shi Zheng
In this paper, we propose a Graph-based High-order Relation Modeling (GHRM) module to exploit the high-order relations in the long-term actions for long-term action recognition.
Ranked #5 on Long-video Activity Recognition on Breakfast
1 code implementation • CVPR 2020 • Haoxin Li, Wei-Shi Zheng, Yu Tao, Haifeng Hu, Jian-Huang Lai
We propose to search the network structures with differentiable architecture search mechanism, which learns to construct adaptive structures for different videos to facilitate adaptive interaction modeling.
1 code implementation • 26 Jul 2019 • Shuosen Guan, Haoxin Li, Wei-Shi Zheng
Most of current Convolution Neural Network (CNN) based methods for optical flow estimation focus on learning optical flow on synthetic datasets with groundtruth, which is not practical.
no code implementations • CVPR 2019 • Haoxin Li, Yijun Cai, Wei-Shi Zheng
To exploit the strong relations for egocentric interaction recognition, we introduce a dual relation modeling framework which learns to model the relations between the camera wearer and the interactor based on the individual action representations of the two persons.