Search Results for author: Yusheng Xie

Found 17 papers, 5 papers with code

MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets

no code implementations5 Mar 2024 Hossein Aboutalebi, Hwanjun Song, Yusheng Xie, Arshit Gupta, Justin Sun, Hang Su, Igor Shalyminov, Nikolaos Pappas, Siffi Singh, Saab Mansour

Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs.

Image-text matching Retrieval +1

Multiple-Question Multiple-Answer Text-VQA

no code implementations15 Nov 2023 Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

no code implementations7 Feb 2023 Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha

Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.

Semantic Segmentation

AIM: Adapting Image Models for Efficient Video Action Recognition

1 code implementation6 Feb 2023 Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.

 Ranked #1 on Action Recognition on Diving-48 (using extra training data)

Action Classification Action Recognition +2

Towards Differential Relational Privacy and its use in Question Answering

no code implementations30 Mar 2022 Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.

Memorization Question Answering

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

1 code implementation18 Oct 2021 Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.

Ranked #19 on 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Human Pose Estimation 3D Pose Estimation

MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation

no code implementations6 Dec 2020 Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Xiaohui Xie

Based on the match algorithm, we propose an efficient pipeline to generate a large-scale multi-view hand mesh (MVHM) dataset with accurate 3D hand mesh and joint labels.

3D Hand Pose Estimation

Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos

no code implementations6 Dec 2020 Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Xiaohui Xie

Experiments show that our modelachieves surprisingly good results, with 3D estimation ac-curacy on par with the state-of-the-art models trained with3D annotations, highlighting the benefit of the temporalconsistency in constraining 3D prediction models.

Pose Estimation Self-Supervised Learning

DGGAN: Depth-image Guided Generative Adversarial Networks for Disentangling RGB and Depth Images in 3D Hand Pose Estimation

no code implementations6 Dec 2020 Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Wei Fan, Xiaohui Xie

Estimating3D hand poses from RGB images is essentialto a wide range of potential applications, but is challengingowing to substantial ambiguity in the inference of depth in-formation from RGB images.

3D Hand Pose Estimation Generative Adversarial Network

Variational hybridization and transformation for large inaccurate noisy-or networks

no code implementations20 May 2016 Yusheng Xie, Nan Du, Wei Fan, Jing Zhai, Weicheng Zhu

In addition, we propose a transformation ranking algorithm that is very stable to large variances in network prior probabilities, a common issue that arises in medical applications of Bayesian networks.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.