no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann
Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.
no code implementations • ECCV 2020 • Lin Huang, Jianchao Tan, Ji Liu, Junsong Yuan
To address this issue, we connect this structured output learning problem with the structured modeling framework in sequence transduction field.
no code implementations • ECCV 2020 • Yunpeng Chang, Zhigang Tu, Wei Xie, Junsong Yuan
Because of the ambiguous definition of anomaly and the complexity of real data, anomaly detection in videos is one of the most challenging problems in intelligent video surveillance.
no code implementations • 5 Feb 2025 • Wenhao You, Bryan Hooi, Yiwei Wang, Euijin Choo, Ming-Hsuan Yang, Junsong Yuan, Zi Huang, Yujun Cai
Recent advancements in diffusion models have driven the growth of text-guided image editing tools, enabling precise and iterative modifications of synthesized content.
no code implementations • 5 Feb 2025 • Yao Ding, Zhili Zhang, Aitao Yang, Yaoming Cai, Xiongwu Xiao, Danfeng Hong, Junsong Yuan
A dual-branch graph contrastive learning module is developed, where Gaussian noise perturbations generate augmented views through two multilayer perceptrons (MLPs), and a cross-view contrastive loss enforces structural consistency between views to learn noise-invariant representations.
no code implementations • 4 Sep 2024 • Xuelu Feng, Yunsheng Li, Dongdong Chen, Chunming Qiao, Junsong Yuan, Lu Yuan, Gang Hua
We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image.
1 code implementation • 11 Aug 2024 • Zhigang Tu, Zitao Gao, Zhengbo Zhang, Chunluan Zhou, Junsong Yuan, Bo Du
Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert.
no code implementations • 31 Jul 2024 • Sudhir Yarram, Junsong Yuan
Video extrapolation in space and time (VEST) enables viewers to forecast a 3D scene into the future and view it from novel viewpoints.
1 code implementation • 15 Jul 2024 • Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, JianFeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang
First, to enable dual-modal generation and maximize the information exchange between video and depth generation, we propose a unified dual-modal U-Net, a parameter-sharing framework for joint video and depth denoising, wherein a modality label guides the denoising target, and cross-modal attention enables the mutual information flow.
no code implementations • 12 Jul 2024 • Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu
Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction.
1 code implementation • 11 Jun 2024 • Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang
Extensive experiments show that our MCM achieves the state-of-the-art video diffusion distillation performance.
no code implementations • 20 Apr 2024 • Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan
To address this problem, we propose the Generalizable Temporal Action Localization task (GTAL), which focuses on improving the generalizability of action localization methods.
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
1 code implementation • 18 Mar 2024 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua
We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding.
Referring Video Object Segmentation
Semantic Segmentation
+2
1 code implementation • CVPR 2024 • Xianzu Wu, Xianfeng Wu, Tianyu Luan, Yajing Bai, Zhongyuan Lai, Junsong Yuan
While previous studies have demonstrated successful 3D object shape completion with a sufficient number of points, they often fail in scenarios when a few points, e. g. tens of points, are observed.
no code implementations • CVPR 2024 • Tianyu Luan, Zhong Li, Lele Chen, Xuan Gong, Lichang Chen, Yi Xu, Junsong Yuan
Then, we calculate the Area Under the Curve (AUC) difference between the two spectrums, so that each frequency band that captures either the overall or detailed shape is equitably considered.
no code implementations • 26 Jan 2024 • Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee, Junsong Yuan, Yu-Ping Chang
This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation.
no code implementations • 13 Dec 2023 • Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang
In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
no code implementations • ACM International Conference on Multimedia 2023 • Sidharth Anand, Naresh Kumar Devulapally, Sreyasee Das Bhattacharjee, Junsong Yuan
Evaluating speaker emotion in conversations is crucial for various applications requiring human-computer interaction.
Ranked #1 on
Multimodal Sentiment Analysis
on CMU-MOSEI
1 code implementation • 23 Oct 2023 • Zhong Li, Liangchen Song, Zhang Chen, Xiangyu Du, Lele Chen, Junsong Yuan, Yi Xu
A DecomposeNet learns to map each ray to its SVBRDF components: albedo, normal, and roughness.
1 code implementation • ICCV 2023 • Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, Yi Xu
The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals.
1 code implementation • ICCV 2023 • Yuanhao Zhai, Tianyu Luan, David Doermann, Junsong Yuan
To improve the generalization ability, we propose weakly-supervised self-consistency learning (WSCL) to leverage the weakly annotated images.
1 code implementation • ICCV 2023 • Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
1 code implementation • 18 Aug 2023 • Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan
In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition.
1 code implementation • 19 Jul 2023 • Qinji Yu, Nan Xi, Junsong Yuan, Ziyu Zhou, Kang Dang, Xiaowei Ding
To tackle the source data-absent problem, we present a novel two-stage source-free domain adaptation (SFDA) framework for medical image segmentation, where only a well-trained source segmentation model and unlabeled target data are available during domain adaptation.
1 code implementation • ICCV 2023 • Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong
In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.
1 code implementation • CVPR 2023 • Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, Junsong Yuan
To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component.
no code implementations • 18 May 2023 • Liangchen Song, Liangliang Cao, Hongyu Xu, Kai Kang, Feng Tang, Junsong Yuan, Yang Zhao
The proposed framework consists of two significant components: Geometry Guided Diffusion and Mesh Optimization.
no code implementations • CVPR 2023 • Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
no code implementations • 12 Apr 2023 • Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules.
1 code implementation • 15 Mar 2023 • Liangchen Song, Zhong Li, Xuan Gong, Lele Chen, Zhang Chen, Yi Xu, Junsong Yuan
We further propose a simple-yet-effective strategy for tuning the frequency to avoid overfitting few-shot inputs: enforcing consistency among the frequency domain of rendered 2D images.
no code implementations • ICCV 2023 • Nan Xi, Jingjing Meng, Junsong Yuan
To this end, we propose ACoLP, a model of Action-centric Chain-of-Look Prompting for open set video HOI detection.
no code implementations • CVPR 2023 • Libing Zeng, Lele Chen, Wentao Bao, Zhong Li, Yi Xu, Junsong Yuan, Nima Khademi Kalantari
Accurate facial landmark detection on wild images plays an essential role in human-computer interaction, entertainment, and medical applications.
no code implementations • 10 Dec 2022 • Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e. g., motion capture, sport analysis) and robustness to single-view ambiguities.
1 code implementation • 1 Dec 2022 • Jialian Wu, JianFeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang
Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate open-set object descriptions.
Ranked #2 on
Dense Captioning
on Visual Genome
no code implementations • 28 Oct 2022 • Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, Andreas Geiger
Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest.
no code implementations • 16 Oct 2022 • Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation in this work.
no code implementations • 21 Sep 2022 • Liangchen Song, Xuan Gong, Benjamin Planche, Meng Zheng, David Doermann, Junsong Yuan, Terrence Chen, Ziyan Wu
We propose to regularize the estimated motion to be predictable.
1 code implementation • 14 Sep 2022 • Junxuan Huang, Yatong An, Lu Cheng, Bai Chen, Junsong Yuan, Chunming Qiao
Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models.
no code implementations • 30 Jul 2022 • Lin Huang, Tomas Hodan, Lingni Ma, Linguang Zhang, Luan Tran, Christopher Twigg, Po-Chen Wu, Junsong Yuan, Cem Keskin, Robert Wang
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
1 code implementation • 20 Jul 2022 • Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan
However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement.
Ranked #2 on
Visual Object Tracking
on OTB-100
no code implementations • 21 Jun 2022 • Nitin Bansal, Pan Ji, Junsong Yuan, Yi Xu
Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w. r. t model's generalizability, performance, and training/inference memory footprint.
no code implementations • 20 Mar 2022 • Zhigang Tu, Hongyan Li, Wei Xie, Yuanzhong Liu, Shifu Zhang, Baoxin Li, Junsong Yuan
Video super-resolution is currently one of the most active research topics in computer vision as it plays an important role in many visual applications.
1 code implementation • 12 Mar 2022 • Sudhir Yarram, Jialian Wu, Pan Ji, Yi Xu, Junsong Yuan
To improve the training efficiency, we propose Deformable VisTR, leveraging spatio-temporal deformable attention module that only attends to a small fixed set of key spatio-temporal sampling points around a reference point.
no code implementations • CVPR 2022 • Jialian Wu, Sudhir Yarram, Hui Liang, Tian Lan, Junsong Yuan, Jayan Eledath, Gerard Medioni
In addition, VisTR is not fully end-to-end learnable in multiple video clips as it requires a hand-crafted data association to link instance tracklets between successive clips.
1 code implementation • CVPR 2022 • Jinlu Zhang, Zhigang Tu, Jianyu Yang, Yujin Chen, Junsong Yuan
Recent transformer-based solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn spatio-temporal correlation.
Ranked #6 on
Monocular 3D Human Pose Estimation
on Human3.6M
2 code implementations • TIP 2022 • Yuanzhong Liu, Junsong Yuan, Zhigang Tu
Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance.
Ranked #15 on
Action Recognition
on Something-Something V1
no code implementations • 8 Feb 2022 • Zhigang Tu, Jiaxu Zhang, Hongyan Li, Yujin Chen, Junsong Yuan
In recent years, graph convolutional networks (GCNs) play an increasingly critical role in skeleton-based human action recognition.
no code implementations • 24 Jan 2022 • Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, Junsong Yuan
We present a method for reconstructing accurate and consistent 3D hands from a monocular video.
1 code implementation • CVPR 2022 • Suchen Wang, Yueqi Duan, Henghui Ding, Yap-Peng Tan, Kim-Hui Yap, Junsong Yuan
More specifically, we propose a new HOI visual encoder to detect the interacting humans and objects, and map them to a joint feature space to perform interaction recognition.
no code implementations • 22 Oct 2021 • Huan Liu, Junsong Yuan, Chen Wang, Jun Chen
Despite recent improvement of supervised monocular depth estimation, the lack of high quality pixel-wise ground truth annotations has become a major hurdle for further progress.
no code implementations • 8 Aug 2021 • Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu
We introduce the task of open-vocabulary visual instance search (OVIS).
no code implementations • 21 Jun 2021 • Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan
The base model training encourages the model to predict reliable predictions based on single modality (i. e., RGB or optical flow), based on the fusion of which a pseudo ground truth is generated and in turn used as supervision to train the base models.
no code implementations • 15 May 2021 • Zhong Li, Liangchen Song, Celong Liu, Junsong Yuan, Yi Xu
In this paper, we present an efficient and robust deep learning solution for novel view synthesis of complex scenes.
no code implementations • 30 Mar 2021 • Ziyi Liu, Le Wang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua
To address this challenge, we introduce a framework that learns two feature subspaces respectively for actions and their context.
Action Recognition
Weakly-supervised Temporal Action Localization
+1
no code implementations • 28 Mar 2021 • Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua
In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization.
Ranked #8 on
Weakly Supervised Action Localization
on THUMOS’14
Video Polyp Segmentation
Weakly Supervised Action Localization
+2
1 code implementation • CVPR 2021 • Yujin Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, Junsong Yuan
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
Ranked #8 on
3D Hand Pose Estimation
on HO-3D v3
1 code implementation • CVPR 2021 • Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking.
Ranked #1 on
Instance Segmentation
on nuScenes
no code implementations • 15 Feb 2021 • Junxuan Huang, Junsong Yuan, Chunming Qiao
Recent deep networks have achieved good performance on a variety of 3d points classification tasks.
4 code implementations • 1 Feb 2021 • Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang
The outputs from the teacher network are used as soft labels for supervising the training of a new network.
Ranked #36 on
Knowledge Distillation
on ImageNet
1 code implementation • 10 Jan 2021 • Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, DaCheng Tao
SPAGAN therefore allows for a more informative and intact exploration of the graph structure and further {a} more effective aggregation of information from distant neighbors into the center node, as compared to node-based GCN methods.
no code implementations • ICCV 2021 • Suchen Wang, Kim-Hui Yap, Henghui Ding, Jiyan Wu, Junsong Yuan, Yap-Peng Tan
In this work, we study the problem of human-object interaction (HOI) detection with large vocabulary object categories.
no code implementations • ICCV 2021 • Bing Li, Chia-Wen Lin, Cheng Zheng, Shan Liu, Junsong Yuan, Bernard Ghanem, C.-C. Jay Kuo
In the second stage, we derive another warping model to refine warping results in less important regions by eliminating serious distortions in shape, disparity and 3D structure.
Vocal Bursts Intensity Prediction
Vocal Bursts Valence Prediction
no code implementations • ICCV 2021 • Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan
This task is confronted with two challenges: how to establish the 3D correspondences from views to the BEV map and how to assemble occupancy information across views.
Ranked #2 on
Multiview Detection
on CVCS
(MODA (1m) metric)
no code implementations • ICCV 2021 • Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann
Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.
no code implementations • ICLR 2021 • Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang
In this paper, we investigate the bias-variance tradeoff brought by distillation with soft labels.
no code implementations • 7 Nov 2020 • Jun Wen, Changjian Shui, Kun Kuang, Junsong Yuan, Zenan Huang, Zhefeng Gong, Nenggan Zheng
To address this issue, we intervene in the learning of feature discriminability using unlabeled target data to guide it to get rid of the domain-specific part and be safely transferable.
no code implementations • ECCV 2020 • Yuanhao Zhai, Le Wang, Wei Tang, Qilin Zhang, Junsong Yuan, Gang Hua
Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and localize all action instances in an untrimmed video under only video-level supervision.
Ranked #12 on
Weakly Supervised Action Localization
on THUMOS14
Vocal Bursts Valence Prediction
Weakly Supervised Action Localization
+2
no code implementations • 30 Sep 2020 • Zhenzhen Wang, Chunyan Xu, Yap-Peng Tan, Junsong Yuan
In this paper, the attention-aware noisy label learning approach ($A^2NL$) is proposed to improve the discriminative capability of the network trained on datasets with potential label noise.
2 code implementations • 14 Aug 2020 • Ye Liu, Junsong Yuan, Chang Wen Chen
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images.
1 code implementation • 13 Aug 2020 • Jialian Wu, Liangchen Song, Tiancai Wang, Qian Zhang, Junsong Yuan
In the classification tree, as the number of parent class nodes are significantly less, their logits are less noisy and can be utilized to suppress the wrong/noisy logits existed in the fine-grained class nodes.
Ranked #5 on
Few-Shot Object Detection
on LVIS v1.0 val
no code implementations • 12 Aug 2020 • Jing Tang, Xueyan Tang, Andrew Lim, Kai Han, Chongshou Li, Junsong Yuan
Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum.
1 code implementation • 11 Aug 2020 • Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, Raymond Huang
Based on it, we formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
no code implementations • 10 Aug 2020 • Zhenzhen Wang, Weixiang Hong, Junsong Yuan
Deep hashing has shown promising results in image retrieval and recognition.
no code implementations • ECCV 2020 • Junwu Weng, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xudong Jiang, Junsong Yuan
Motivated by the previous success of Two-Dimensional Convolutional Neural Network (2D CNN) on image recognition, researchers endeavor to leverage it to characterize videos.
1 code implementation • ECCV 2020 • Ping Yu, Yang Zhao, Chunyuan Li, Junsong Yuan, Changyou Chen
Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence.
Ranked #2 on
Human action generation
on NTU RGB+D 2D
no code implementations • 28 Jun 2020 • Yujin Chen, Zhigang Tu, Di Kang, Ruizhi Chen, Linchao Bao, Zhengyou Zhang, Junsong Yuan
In this work, we propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
no code implementations • 14 May 2020 • Tianhang Zheng, Sheng Liu, Changyou Chen, Junsong Yuan, Baochun Li, Kui Ren
We first formulate generation of adversarial skeleton actions as a constrained optimization problem by representing or approximating the physiological and physical constraints with mathematical formulations.
1 code implementation • CVPR 2020 • Yancheng Wang, Yang Xiao, Fu Xiong, Wenxiang Jiang, Zhiguo Cao, Joey Tianyi Zhou, Junsong Yuan
Each available 3DV voxel intrinsically involves 3D spatial and motion feature jointly.
no code implementations • 12 Apr 2020 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan
Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object.
no code implementations • ECCV 2020 • Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, Mingxiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei, Yun-hui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel, Romain Brégier, Grégory Rogez, Vincent Lepetit, Tae-Kyun Kim
To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
no code implementations • 24 Jan 2020 • Wei Wang, Shibo Zhou, Jingxi Li, Xiaohua LI, Junsong Yuan, Zhanpeng Jin
Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving.
1 code implementation • AAAI 2019 • Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen
In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality.
Ranked #4 on
Human action generation
on NTU RGB+D 2D
2 code implementations • ICCV 2019 • Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, Junsong Yuan
For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed.
Ranked #1 on
Hand Pose Estimation
on K2HPD
no code implementations • 26 Jul 2019 • Bin Jiang, Wenxuan Tu, Chao Yang, Junsong Yuan
The core components of CIFReNet are the Long-skip Refinement Module (LRM) and the Multi-scale Context Integration Module (MCIM).
no code implementations • 24 Jun 2019 • Jun Wen, Nenggan Zheng, Junsong Yuan, Zhefeng Gong, Changyou Chen
By imposing distribution matching on both features and labels (via uncertainty), label distribution mismatching in source and target data is effectively alleviated, encouraging the classifier to produce consistent predictions across domains.
6 code implementations • CVPR 2019 • Chen Wang, Jianfei Yang, Lihua Xie, Junsong Yuan
Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks.
2 code implementations • CVPR 2019 • Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan
This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image.
no code implementations • 1 Mar 2019 • Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan
Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.
no code implementations • 21 Feb 2019 • Guilei Hu, Yang Xiao, Zhiguo Cao, Lubin Meng, Zhiwen Fang, Joey Tianyi Zhou, Junsong Yuan
Effective and real-time eyeblink detection is of wide-range applications, such as deception detection, drive fatigue detection, face anti-spoofing, etc.
3 code implementations • ICCV 2019 • Tianhang Zheng, Changyou Chen, Junsong Yuan, Bo Li, Kui Ren
Our motivation for constructing a saliency map is by point dropping, which is a non-differentiable operator.
no code implementations • 12 Nov 2018 • Jun Wen, Risheng Liu, Nenggan Zheng, Qian Zheng, Zhefeng Gong, Junsong Yuan
In this paper, we present a method for learning domain-invariant local feature patterns and jointly aligning holistic and local feature statistics.
no code implementations • ECCV 2018 • Chunluan Zhou, Junsong Yuan
The full body estimation branch is trained to regress full body regions for positive pedestrian proposals, while the visible part estimation branch is trained to regress visible part regions for both positive and negative pedestrian proposals.
no code implementations • ECCV 2018 • Tan Yu, Junsong Yuan, Chen Fang, Hailin Jin
Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features.
no code implementations • ECCV 2018 • Junwu Weng, Mengyuan Liu, Xudong Jiang, Junsong Yuan
This deformable convolution can better utilize contextual joints for action and gesture recognition and is more robust to noisy joints.
no code implementations • ECCV 2018 • Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan
Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data.
no code implementations • ECCV 2018 • Liuhao Ge, Zhou Ren, Junsong Yuan
Convolutional Neural Networks (CNNs)-based methods for 3D hand pose estimation with depth cameras usually take 2D depth images as input and directly regress holistic 3D hand pose.
no code implementations • 23 Jul 2018 • Kang Dang, Chunluan Zhou, Zhigang Tu, Michael Hoy, Justin Dauwels, Junsong Yuan
One major challenge for this task is that when an actor performs an action, different body parts of the actor provide different types of cues for the action category and may receive inconsistent action labeling when they are labeled independently.
no code implementations • CVPR 2018 • Tan Yu, Jingjing Meng, Junsong Yuan
View-based methods have achieved considerable success in $3$D object recognition tasks.
no code implementations • CVPR 2018 • Mengyuan Liu, Junsong Yuan
Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e. g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively.
Ranked #1 on
Multimodal Activity Recognition
on UTD-MHAD
no code implementations • CVPR 2018 • Weixiang Hong, Zhenzhen Wang, Ming Yang, Junsong Yuan
In recent years, deep neural nets have triumphed over many computer vision problems, including semantic segmentation, which is a critical task in emerging autonomous driving and medical image diagnostics applications.
1 code implementation • CVPR 2018 • Liuhao Ge, Yujun Cai, Junwu Weng, Junsong Yuan
Convolutional Neural Network (CNN) has shown promising results for 3D hand pose estimation in depth images.
Ranked #7 on
Hand Pose Estimation
on HANDS 2017
no code implementations • CVPR 2018 • Shizheng Wang, Wenjuan Liao, Phil Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan
Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays.
1 code implementation • CVPR 2018 • Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim
Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
Ranked #5 on
Hand Pose Estimation
on HANDS 2017
2 code implementations • 16 Oct 2017 • Chen Wang, Minh-Chung Hoang, Lihua Xie, Junsong Yuan
We present a novel visual SLAM method for the warehouse robot with a single downward-facing camera using ground textures.
Robotics
no code implementations • ICCV 2017 • Tan Yu, Zhenzhen Wang, Junsong Yuan
Most of current visual search systems focus on image-to-image (point-to-point) search such as image and object retrieval.
no code implementations • ICCV 2017 • Jiong Yang, Junsong Yuan
Similar to common object discovery in images or videos, it is of great interests to discover and locate common actions in videos, which can benefit many video analytics applications such as video summarization, search, and understanding.
no code implementations • ICCV 2017 • Chunluan Zhou, Junsong Yuan
Detecting pedestrians that are partially occluded remains a challenging problem due to variations and uncertainties of partial occlusion patterns.
3 code implementations • 12 Sep 2017 • Chen Wang, Le Zhang, Lihua Xie, Junsong Yuan
Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking.
no code implementations • CVPR 2017 • Liuhao Ge, Hui Liang, Junsong Yuan, Daniel Thalmann
We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs).
no code implementations • CVPR 2017 • Weixiang Hong, Junsong Yuan, Sreyasee Das Bhattacharjee
We argue that long binary codes (b O(d)) are critical to fully utilize the discriminative power of high-dimensional visual features, and can achieve better results in various tasks such as approximate nearest neighbour search.
1 code implementation • CVPR 2017 • Junwu Weng, Chaoqun Weng, Junsong Yuan
Moreover, by identifying key skeleton joints and temporal stages for each action class, our ST-NBNN can capture the essential spatio-temporal patterns that play key roles of recognizing actions, which is not always achievable by using end-to-end models.
no code implementations • CVPR 2017 • Tan Yu, Yuwei Wu, Junsong Yuan
This paper tackles the problem of efficient and effective object instance search in videos.
no code implementations • CVPR 2017 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan
Recent advances in the joint processing of images have certainly shown its advantages over the individual processing.
no code implementations • CVPR 2016 • Liuhao Ge, Hui Liang, Junsong Yuan, Daniel Thalmann
Articulated hand pose estimation plays an important role in human-computer interaction.
no code implementations • CVPR 2016 • Jingjing Meng, Hongxing Wang, Junsong Yuan, Yap-Peng Tan
This representative selection problem is formulated as a sparse dictionary selection problem, i. e., choosing a few representatives object proposals to reconstruct the whole proposal pool.
no code implementations • ICCV 2015 • Kang Dang, Jiong Yang, Junsong Yuan
We propose an efficient online video filtering method, called adaptive exponential filtering (AES) to refine pixel prediction maps.
no code implementations • CVPR 2015 • Gang Yu, Junsong Yuan
Assuming each action is performed by a human with meaningful motion, both appearance and motion cues are utilized to measure the actionness of the video tubes.
no code implementations • CVPR 2014 • Hongxing Wang, Chaoqun Weng, Junsong Yuan
To find a consensus clustering result that is agreeable to all feature modalities, our objective is to find a universal feature embedding, which not only fits each individual feature modality well, but also unifies different feature modalities by minimizing their pairwise disagreements.
no code implementations • CVPR 2013 • Gangqiang Zhao, Junsong Yuan, Gang Hua
We show that such data driven co-occurrence information from bottom-up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top down probabilistic topic modeling with bottom up priors in a unified model.
no code implementations • NeurIPS 2012 • Du Tran, Junsong Yuan
The mapping between a video and a spatio-temporal action trajectory is learned.