no code implementations • 8 Mar 2025 • Zixi Kang, Xinghan Wang, Yadong Mu
Human motion generation holds significant promise in fields such as animation, film production, and robotics.
no code implementations • 31 Jan 2025 • YuChen Lin, Chenguo Lin, Jianjin Xu, Yadong Mu
Comprehensive experiments demonstrate that OmniPhysGS achieves more general and realistic physical dynamics across a broader spectrum of materials, including elastic, viscoelastic, plastic, and fluid substances, as well as interactions between different materials.
1 code implementation • 28 Jan 2025 • Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu
Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation.
1 code implementation • 8 Oct 2024 • Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang song, Yadong Mu, Zhouchen Lin
Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.
1 code implementation • 2 Oct 2024 • Jinghan Li, Zhicheng Sun, Fei Li, Cao Sheng, Jiazhong Yu, Yadong Mu
In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions into long-horizon action sequences.
no code implementations • 22 Jul 2024 • Kangqi Ma, Hao Dong, Yadong Mu
This paper addresses the challenge of robotic grasping of general objects.
1 code implementation • 10 Jul 2024 • Chenguo Lin, YuChen Lin, Panwang Pan, Xuanyang Zhang, Yadong Mu
The proposed semantic graph prior learns layout appearances and object distributions simultaneously, demonstrating versatility across various downstream tasks in a zero-shot manner.
1 code implementation • 18 Jun 2024 • Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, YongJie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios.
1 code implementation • 14 Jun 2024 • Zhentao Tan, Yadong Mu
Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning.
1 code implementation • 23 May 2024 • Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Liwei Chen, Hao Jiang, Yang song, Kun Gai, Yadong Mu
Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.
no code implementations • 25 Apr 2024 • Hongyu Yan, Yadong Mu
To tackle this, we propose an end-to-end model known as the Neural Assembler.
no code implementations • 17 Apr 2024 • Xinghan Wang, Zixi Kang, Yadong Mu
We address these challenges by proposing Text-controlled Motion Mamba (TM-Mamba), a unified model that integrates temporal global context, language query control, and spatial graph topology with only linear memory cost.
1 code implementation • 7 Feb 2024 • Chenguo Lin, Yadong Mu
We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis.
1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu
In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.
Ranked #3 on
Text-to-Video Generation
on MSR-VTT
1 code implementation • CVPR 2024 • Zhicheng Sun, Jinghan Li, Yadong Mu
To address this problem we exploit three levels of orthogonality in the detection process: First the objectness and classification heads are disentangled by operating on separate sets of features that are orthogonal to each other in a devised polar coordinate system.
no code implementations • CVPR 2024 • Hao Jiang, Bingfeng Zhou, Yadong Mu
In this paper we propose an innovative halftoning method termed "neural dot-controllable halftoning".
no code implementations • CVPR 2024 • Hanwen Liu, Zhicheng Sun, Yadong Mu
In this way the personal semantics of these images are under protection even if these images are modified to some extent.
no code implementations • 14 Sep 2023 • Peiran Xu, Yadong Mu
Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image.
1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu
Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.
1 code implementation • CVPR 2023 • Zhicheng Sun, Yadong Mu, Gang Hua
Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.
no code implementations • CVPR 2023 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.
1 code implementation • CVPR 2023 • Xinghan Wang, Xin Xu, Yadong Mu
Besides, we also show that our Koopman pooling framework can be easily extended to one-shot action recognition when combined with Dynamic Mode Decomposition.
no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu
Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.
1 code implementation • 7 Nov 2022 • Xingqian Xu, Shant Navasardyan, Vahram Tadevosyan, Andranik Sargsyan, Yadong Mu, Humphrey Shi
We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i. e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved.
Ranked #2 on
Image Inpainting
on FFHQ 512 x 512
1 code implementation • ACM Multimedia 2022 • Zhicheng Sun, Yadong Mu
The task of lifelong person re-identification aims to match a person across multiple cameras given continuous data streams.
1 code implementation • 27 Sep 2022 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.
no code implementations • 8 Jan 2022 • Peijun Bao, Yadong Mu
To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship.
no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu
The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.
no code implementations • CVPR 2022 • Hao Jiang, Yadong Mu
To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i. e., video moment localization) equipped with abundant training data.
Ranked #4 on
Supervised Video Summarization
on SumMe
(using extra training data)
no code implementations • 12 Oct 2021 • Xinzhe Zhou, Wei Liu, Yadong Mu
In a most information-rich case of knowing environment maps and admitting shortest-path prior, we observe that given an origin-destination node pair, the internal route can be uniquely determined.
no code implementations • 11 May 2021 • Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu
To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.
1 code implementation • NeurIPS 2020 • Lu Chi, Borui Jiang, Yadong Mu
FFC is a generic operator that can directly replace vanilla convolutions in a large body of existing networks, without any adjustments and with comparable complexity metrics (e. g., FLOPs).
1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Ranked #11 on
Weakly Supervised Action Localization
on ActivityNet-1.2
Weakly Supervised Action Localization
Weakly-supervised Temporal Action Localization
+1
1 code implementation • MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian
We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.
42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Ranked #1 on
Object Detection
on COCO test-dev
(Hardware Burden metric)
no code implementations • 2 Aug 2019 • Guoqiang Gong, Liangfeng Zheng, Kun Bai, Yadong Mu
Our proposed TSA-Net demonstrates clear and consistent better performances and re-calibrates new state-of-the-art on both benchmarks.
no code implementations • 1 Aug 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian
In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification.
Ranked #35 on
Action Recognition
on UCF101
no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees G. M. Snoek
Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremen- dous amounts of data.
Ranked #1 on
Few-Shot Semantic Segmentation
on Pascal5i
39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang
The proposed approach achieves superior results to existing single-model networks on COCO object detection.
Ranked #7 on
Semantic Segmentation
on LIP val
1 code implementation • 12 Aug 2017 • Lu Chi, Yadong Mu
There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks.
no code implementations • 12 Aug 2016 • Yadong Mu, Zhu Liu
In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning.
no code implementations • 14 Mar 2016 • Fumin Shen, Yadong Mu, Wei Liu, Yang Yang, Heng Tao Shen
The optimization alternatively proceeds over the binary classifiers and image hash codes.
no code implementations • 28 Jun 2015 • Yadong Mu, Wei Liu, Wei Fan
Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data.
no code implementations • CVPR 2014 • Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang
This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.
no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan
Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.