Search Results for author: Yadong Mu

Found 31 papers, 14 papers with code

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

no code implementations7 Feb 2024 Chenguo Lin, Yadong Mu

We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis.

Benchmarking Indoor Scene Synthesis

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation5 Feb 2024 Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Video Understanding Visual Question Answering

Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion

no code implementations14 Sep 2023 Peiran Xu, Yadong Mu

Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image.

Co-Salient Object Detection Object +2

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

1 code implementation9 Sep 2023 Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.

Language Modelling Large Language Model +1

Regularizing Second-Order Influences for Continual Learning

1 code implementation CVPR 2023 Zhicheng Sun, Yadong Mu, Gang Hua

Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.

Continual Learning

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

no code implementations CVPR 2023 Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.

Video Action Segmentation via Contextually Refined Temporal Keypoints

no code implementations ICCV 2023 Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.

Action Segmentation Graph Matching +1

Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition

1 code implementation CVPR 2023 Xinghan Wang, Xin Xu, Yadong Mu

Besides, we also show that our Koopman pooling framework can be easily extended to one-shot action recognition when combined with Dynamic Mode Decomposition.

Action Recognition Skeleton Based Action Recognition +1

Image Completion with Heterogeneously Filtered Spectral Hints

1 code implementation7 Nov 2022 Xingqian Xu, Shant Navasardyan, Vahram Tadevosyan, Andranik Sargsyan, Yadong Mu, Humphrey Shi

We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i. e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved.

Image Inpainting

Patch-based Knowledge Distillation for Lifelong Person Re-Identification

1 code implementation ACM Multimedia 2022 Zhicheng Sun, Yadong Mu

The task of lifelong person re-identification aims to match a person across multiple cameras given continuous data streams.

Continual Learning Knowledge Distillation +1

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

1 code implementation27 Sep 2022 Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.

Spatio-Temporal Video Grounding Video Grounding

Learning Sample Importance for Cross-Scenario Video Temporal Grounding

no code implementations8 Jan 2022 Peijun Bao, Yadong Mu

To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship.


Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

no code implementations CVPR 2022 Hao Jiang, Yadong Mu

To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i. e., video moment localization) equipped with abundant training data.

Video Summarization

Complex Video Action Reasoning via Learnable Markov Logic Network

no code implementations CVPR 2022 Yang Jin, Linchao Zhu, Yadong Mu

The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.

Action Recognition Human-Object Interaction Detection +1

Rethinking the Spatial Route Prior in Vision-and-Language Navigation

no code implementations12 Oct 2021 Xinzhe Zhou, Wei Liu, Yadong Mu

In a most information-rich case of knowing environment maps and admitting shortest-path prior, we observe that given an origin-destination node pair, the internal route can be uniquely determined.

Navigate Vision and Language Navigation

Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds

no code implementations11 May 2021 Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu

To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.

Backdoor Attack Denoising

Fast Fourier Convolution

1 code implementation NeurIPS 2020 Lu Chi, Borui Jiang, Yadong Mu

FFC is a generic operator that can directly replace vanilla convolutions in a large body of existing networks, without any adjustments and with comparable complexity metrics (e. g., FLOPs).

Action Recognition Keypoint Detection +1

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation ICML 2020 Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

Fast Non-Local Neural Networks with Spectral Residual Learning

1 code implementation MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian

We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.

Pose Estimation Video Classification

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

no code implementations2 Aug 2019 Guoqiang Gong, Liangfeng Zheng, Kun Bai, Yadong Mu

Our proposed TSA-Net demonstrates clear and consistent better performances and re-calibrates new state-of-the-art on both benchmarks.

Temporal Action Localization

Two-Stream Video Classification with Cross-Modality Attention

no code implementations1 Aug 2019 Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian

In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification.

Action Classification Action Recognition +5

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

1 code implementation12 Aug 2017 Lu Chi, Yadong Mu

There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks.

Autonomous Driving object-detection +1

Deep Hashing: A Joint Approach for Image Signature Learning

no code implementations12 Aug 2016 Yadong Mu, Zhu Liu

In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning.

Deep Hashing Feature Engineering +2

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

no code implementations28 Jun 2015 Yadong Mu, Wei Liu, Wei Fan

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data.

Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

no code implementations CVPR 2014 Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang

This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.

Classification General Classification

Distributed Low-rank Subspace Segmentation

no code implementations20 Apr 2013 Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Clustering Event Detection +4

Cannot find the paper you are looking for? You can Submit a new open access paper.