no code implementations • 5 Mar 2024 • Hossein Aboutalebi, Hwanjun Song, Yusheng Xie, Arshit Gupta, Justin Sun, Hang Su, Igor Shalyminov, Nikolaos Pappas, Siffi Singh, Saab Mansour
Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs.
no code implementations • 15 Nov 2023 • Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan
We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.
no code implementations • 7 Feb 2023 • Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha
Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.
1 code implementation • 6 Feb 2023 • Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li
Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.
Ranked #1 on Action Recognition on Diving-48 (using extra training data)
no code implementations • 30 Mar 2022 • Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto
While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.
1 code implementation • CVPR 2022 • Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha
Accounting for this, we propose a single objective pre-training scheme that requires only text and spatial cues.
1 code implementation • 18 Oct 2021 • Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie
The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.
Ranked #19 on 3D Human Pose Estimation on Human3.6M (using extra training data)
1 code implementation • ICCV 2021 • Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha
DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.
Ranked #3 on Document Image Classification on RVL-CDIP
no code implementations • 6 Dec 2020 • Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Xiaohui Xie
Based on the match algorithm, we propose an efficient pipeline to generate a large-scale multi-view hand mesh (MVHM) dataset with accurate 3D hand mesh and joint labels.
no code implementations • 6 Dec 2020 • Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Xiaohui Xie
Experiments show that our modelachieves surprisingly good results, with 3D estimation ac-curacy on par with the state-of-the-art models trained with3D annotations, highlighting the benefit of the temporalconsistency in constraining 3D prediction models.
no code implementations • 6 Dec 2020 • Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Wei Fan, Xiaohui Xie
Estimating3D hand poses from RGB images is essentialto a wide range of potential applications, but is challengingowing to substantial ambiguity in the inference of depth in-formation from RGB images.
no code implementations • 1 Dec 2020 • Srikar Appalaraju, Yi Zhu, Yusheng Xie, István Fehérvári
Self-supervised representation learning has seen remarkable progress in the last few years.
1 code implementation • 2 Oct 2020 • Zhenyu Wu, Duc Hoang, Shih-Yao Lin, Yusheng Xie, Liangjian Chen, Yen-Yu Lin, Zhangyang Wang, Wei Fan
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
no code implementations • 25 Nov 2018 • Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Hui Tang, Yufan Xue, Xiaohui Xie, Yen-Yu Lin, Wei Fan
Hand pose estimation from a monocular RGB image is an important but challenging task.
no code implementations • 1 Nov 2018 • Sheng Shen, Yaliang Li, Nan Du, Xian Wu, Yusheng Xie, Shen Ge, Tao Yang, Kai Wang, Xingzheng Liang, Wei Fan
Question answering (QA) has achieved promising progress recently.
no code implementations • 20 May 2016 • Yusheng Xie, Nan Du, Wei Fan, Jing Zhai, Weicheng Zhu
In addition, we propose a transformation ranking algorithm that is very stable to large variances in network prior probabilities, a common issue that arises in medical applications of Bayesian networks.