Search Results for author: Xiao Sun

Found 55 papers, 24 papers with code

DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

no code implementations • 4 Apr 2024 • Yiming Zhang, Zhe Wang, Xinjie Li, Yunchen Yuan, Chengsong Zhang, Xiao Sun, Zhihang Zhong, Jian Wang

Human body restoration plays a vital role in various applications related to the human body.

Benchmarking Image Restoration

Paper
Add Code

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

no code implementations • 3 Apr 2024 • Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun

We present Dive Into the BoundarieS (DIBS), a novel pretraining framework for dense video captioning (DVC), that elaborates on improving the quality of the generated event captions and their associated pseudo event boundaries from unlabeled videos.

Dense Video Captioning

Paper
Add Code

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

no code implementations • 28 Mar 2024 • Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng

Neural rendering techniques have significantly advanced 3D human body modeling.

Neural Rendering Quantization

Paper
Add Code

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption

1 code implementation • 14 Dec 2023 • Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, Tatsuya Harada

The standard Neural Radiance Fields (NeRF) paradigm employs a viewer-centered methodology, entangling the aspects of illumination and material reflectance into emission solely from 3D points.

Paper
Code

Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation

1 code implementation • 12 Dec 2023 • Yuchen Yang, Yu Qiao, Xiao Sun

Automatic estimation of 3D human pose from monocular RGB images is a challenging and unsolved problem in computer vision.

3D Pose Estimation

Paper
Code

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

1 code implementation • 14 Nov 2023 • Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang

Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements.

Object Video Editing +1

151

Paper
Code

ASM: Adaptive Sample Mining for In-The-Wild Facial Expression Recognition

no code implementations • 9 Oct 2023 • Ziyang Zhang, Xiao Sun, Liuwei An, Meng Wang

First, the Adaptive Threshold Learning module generates two thresholds, namely the clean and noisy thresholds, for each category.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

no code implementations • 13 Sep 2023 • Xiangchen Yin, Zhenda Yu, Xin Gao, Ran Ju, Xiao Sun, Xinyu Zhang

However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain.

Autonomous Driving Low-Light Image Enhancement

Paper
Add Code

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

1 code implementation • 23 Aug 2023 • Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai

Moreover, to enhance stability of the prompt effect evaluation, we propose a novel prompt bagging method involving forward and backward thinking, which is superior to majority voting and is beneficial for both feedback and weight calculation in boosting.

Ensemble Learning Hallucination

Paper
Code

Revisiting Neural Retrieval on Accelerators

no code implementations • 6 Jun 2023 • Jiaqi Zhai, Zhaojie Gong, Yueming Wang, Xiao Sun, Zheng Yan, Fu Li, Xing Liu

A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings.

Information Retrieval Retrieval

Paper
Add Code

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

no code implementations • 21 May 2023 • Yanjing Li, Sheng Xu, Mingbao Lin, Xianbin Cao, Chuanjian Liu, Xiao Sun, Baochang Zhang

Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices.

Binarization Quantization

Paper
Add Code

Long-Term Rhythmic Video Soundtracker

1 code implementation • 2 May 2023 • Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao

To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.

Paper
Code

Enhancing Personalized Ranking With Differentiable Group AUC Optimization

no code implementations • 17 Apr 2023 • Xiao Sun, Bo Zhang, Chenrui Zhang, Han Ren, Mingchen Cai

AUC is a common metric for evaluating the performance of a classifier.

Paper
Add Code

Spatial-temporal Transformer for Affective Behavior Analysis

no code implementations • 19 Mar 2023 • Peng Zou, Rui Wang, Kehua Wen, Yasi Peng, Xiao Sun

The in-the-wild affective behavior analysis has been an important study.

Data Augmentation

Paper
Add Code

Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

1 code implementation • 18 Mar 2023 • Tao Shu, Xinke Wang, Ruotong Wang, Chuang Chen, Yixin Zhang, Xiao Sun

The continuous improvement of human-computer interaction technology makes it possible to compute emotions.

Sentiment Analysis

Paper
Code

Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

no code implementations • 16 Mar 2023 • Ziyang Zhang, Liuwei An, Zishun Cui, Ao Xu, Tengteng Dong, Yueqi Jiang, Jingyi Shi, Xin Liu, Xiao Sun, Meng Wang

In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation.

Paper
Add Code

Aleth-NeRF: Low-light Condition View Synthesis with Concealing Fields

1 code implementation • 10 Mar 2023 • Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, Tatsuya Harada

Common capture low-light scenes are challenging for most computer vision techniques, including Neural Radiance Fields (NeRF).

Paper
Code

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

1 code implementation • ICCV 2023 • Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu

Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.

Data Augmentation Quantization +2

Paper
Code

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

no code implementations • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang, Xiao Sun, HaoDong Wu, Xuncheng Liu, Weizhan Zhang, Caixia Yan, Haipeng Du, Qinghua Zheng, Qi Wang, Wangdu Chen, Ran Duan, Mengdi Sun, Dan Zhu, Guannan Chen, Hojin Cho, Steve Kim, Shijie Yue, Chenghua Li, Zhengyang Zhuge, Wei Chen, Wenxu Wang, Yufeng Zhou, Xiaochen Cai, Hengxing Cai, Kele Xu, Li Liu, Zehua Cheng, Wenyi Lian, Wenjing Lian

While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices.

Video Super-Resolution

Paper
Add Code

Hybrid Multimodal Fusion for Humor Detection

no code implementations • 24 Sep 2022 • Haojie Xu, Weifeng Liu, Jingwei Liu, Mingzheng Li, Yu Feng, Yasi Peng, Yunwei Shi, Xiao Sun, Meng Wang

Our experiments demonstrate the effectiveness of our proposed model and hybrid fusion strategy on multimodal fusion, and the AUC of our proposed model on the test set is 0. 8972.

Humor Detection

Paper
Add Code

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

1 code implementation • 5 Aug 2022 • Jia Li, Ziyang Zhang, Junjie Lang, Yueqi Jiang, Liuwei An, Peng Zou, Yangyang Xu, Sheng Gao, Jie Lin, Chunxiao Fan, Xiao Sun, Meng Wang

In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges.

Data Augmentation Humor Detection +1

Paper
Code

Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance

1 code implementation • 20 Jul 2022 • Zhihang Zhong, Xiao Sun, Zhirong Wu, Yinqiang Zheng, Stephen Lin, Imari Sato

Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region.

Optical Flow Estimation Quantization

Paper
Code

Extreme Masking for Learning Instance and Distributed Visual Representations

1 code implementation • 9 Jun 2022 • Zhirong Wu, Zihang Lai, Xiao Sun, Stephen Lin

The paper presents a scalable approach for learning spatially distributed visual representations over individual tokens and a holistic instance representation simultaneously.

Data Augmentation Representation Learning

Paper
Code

Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations

no code implementations • 19 Apr 2022 • Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada

We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects, in which no image-pose pairs or foreground masks are used for training.

Representation Learning

Paper
Add Code

Bringing Rolling Shutter Images Alive with Dual Reversed Distortion

1 code implementation • 12 Mar 2022 • Zhihang Zhong, Mingdeng Cao, Xiao Sun, Zhirong Wu, Zhongyi Zhou, Yinqiang Zheng, Stephen Lin, Imari Sato

In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task.

Optical Flow Estimation

Paper
Code

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

4 code implementations • CVPR 2022 • Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, Stephen Lin

Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.

Ranked #2 on Sign Language Translation on CSL-Daily

Sign Language Recognition Sign Language Translation +2

196

Paper
Code

Robust facial expression recognition with global‑local joint representation learning

no code implementations • Multimedia Systems 2022 • Chunxiao Fan, zhenxing Wang, Jia Li, Shanshan Wang, Xiao Sun

In the proposed method, (1) the topological structure information and texture feature of regions of interest (ROIs) are modeled as graphs and processed with graph convolutional network (GCN) to remain the topological features.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin

Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.

Action Recognition

Paper
Add Code

Towards Tokenized Human Dynamics Representation

1 code implementation • 22 Nov 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.

Action Segmentation Action Understanding +3

Paper
Code

Self-supervised Discovery of Human Actons from Long Kinematic Videos

no code implementations • 29 Sep 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.

Action Understanding Sentence

Paper
Add Code

ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection

1 code implementation • 9 Sep 2021 • Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon

A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.

Ranked #41 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection

Paper
Code

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

no code implementations • ICCV 2021 • Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, Qiang Xu

While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory.

Ranked #23 on Skeleton Based Action Recognition on NTU RGB+D 120

3D Human Pose Estimation 3D Pose Estimation +3

Paper
Add Code

Data-driven discovery of interpretable causal relations for deep learning material laws with uncertainty propagation

1 code implementation • 20 May 2021 • Xiao Sun, Bahador Bahmani, Nikolaos N. Vlassis, WaiChing Sun, Yanxun Xu

This paper presents a computational framework that generates ensemble predictive mechanics models with uncertainty quantification (UQ).

Causal Discovery Uncertainty Quantification

Paper
Code

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.

Paper
Add Code

Neural Articulated Radiance Field

1 code implementation • ICCV 2021 • Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada

We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.

150

Paper
Code

Ultra-Low Precision 4-bit Training of Deep Neural Networks

no code implementations • NeurIPS 2020 • Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.

Quantization

Paper
Add Code

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

1 code implementation • ECCV 2020 • Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, Stephen Lin

With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference.

Ranked #20 on Monocular 3D Human Pose Estimation on Human3.6M

Monocular 3D Human Pose Estimation

Paper
Code

Detecting Human-Object Interactions with Action Co-occurrence Priors

1 code implementation • 17 Jul 2020 • Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, In So Kweon

A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution.

Human-Object Interaction Detection

Paper
Code

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

1 code implementation • ECCV 2020 • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.

Instance Segmentation Object +5

Paper
Code

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan

Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.

Image Classification object-detection +1

Paper
Add Code

SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation

no code implementations • 6 Nov 2019 • Xiao Sun, Zhouhui Lian, Jianguo Xiao

Point cloud analysis has drawn broader attentions due to its increasing demands in various fields.

Data Augmentation General Classification +1

Paper
Add Code

Downhole Track Detection via Multiscale Conditional Generative Adversarial Nets

no code implementations • 17 Apr 2019 • Jia Li, Xing Wei, Guoqiang Yang, Xiao Sun, Changliang Li

A multiscale shared convolution structure is adopted in the discriminator network to further supervise training the generator.

Autonomous Driving Generative Adversarial Network

Paper
Add Code

Reinforcement Learning Based Emotional Editing Constraint Conversation Generation

no code implementations • 17 Apr 2019 • Jia Li, Xiao Sun, Xing Wei, Changliang Li, Jian-Hua Tao

In recent years, the generation of conversation content based on deep neural networks has attracted many researchers.

Multi-Task Learning reinforcement-learning +1

Paper
Add Code

Explicit Spatiotemporal Joint Relation Learning for Tracking Human Pose

no code implementations • 17 Nov 2018 • Xiao Sun, Chuankang Li, Stephen Lin

We present a method for human pose tracking that is based on learning spatiotemporal relationships among joints.

Optical Flow Estimation Pose Estimation +3

Paper
Add Code

An Integral Pose Regression System for the ECCV2018 PoseTrack Challenge

1 code implementation • 17 Sep 2018 • Xiao Sun, Chuankang Li, Stephen Lin

For the ECCV 2018 PoseTrack Challenge, we present a 3D human pose estimation system based mainly on the integral human pose regression method.

Ranked #1 on 3D Human Pose Estimation on CHALL H80K

3D Human Pose Estimation regression

469

Paper
Code

A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation

no code implementations • EMNLP 2018 • Jingyuan Li, Xiao Sun

Traditional neural language models tend to generate generic replies with poor logic and no emotion.

Paper
Add Code

Integral Human Pose Regression

2 code implementations • ECCV 2018 • Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, Yichen Wei

State-of-the-art human pose estimation methods are based on heat map representation.

Ranked #23 on Pose Estimation on MPII Human Pose

3D Human Pose Estimation 3D Pose Estimation +2

469

Paper
Code

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach

6 code implementations • ICCV 2017 • Xingyi Zhou, Qi-Xing Huang, Xiao Sun, xiangyang xue, Yichen Wei

We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.

Ranked #1 on 3D Human Pose Estimation on Geometric Pose Affordance

2D Pose Estimation 3D Multi-Person Pose Estimation (absolute) +4

609

Paper
Code

Compositional Human Pose Regression

1 code implementation • ICCV 2017 • Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei

A central problem is that the structural information in the pose is not well exploited in the previous regression methods.

Ranked #36 on Pose Estimation on MPII Human Pose

3D Human Pose Estimation 3D Pose Estimation +1

344

Paper
Code

Deep Kinematic Pose Regression

no code implementations • 17 Sep 2016 • Xingyi Zhou, Xiao Sun, Wei zhang, Shuang Liang, Yichen Wei

In this work, we propose to directly embed a kinematic object model into the deep neutral network learning for general articulated object pose estimation.

Ranked #307 on 3D Human Pose Estimation on Human3.6M