Search Results for author: Yandong Li

Found 29 papers, 12 papers with code

Instruct-Imagen: Image Generation with Multi-modal Instruction

no code implementations • 3 Jan 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.

Image Generation Retrieval

Paper
Add Code

DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models

no code implementations • 5 Dec 2023 • Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C. K. Chan, Yandong Li, Yanwu Xu, Kun Zhang, Tingbo Hou

Our extensive experiments demonstrate the superior performance of our method in terms of visual quality, identity preservation, and text control, showcasing its effectiveness in the context of text-guided subject-driven image inpainting.

Image Inpainting

Paper
Add Code

Identity Encoder for Personalized Diffusion

no code implementations • 14 Apr 2023 • Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia

Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.

Image Enhancement Image Generation

Paper
Add Code

What's in a Name? Beyond Class Indices for Image Recognition

no code implementations • 5 Apr 2023 • Kai Han, Yandong Li, Sagar Vaze, Jie Li, Xuhui Jia

In this paper, we reconsider the recognition problem and task a vision-language model to assign class names to images given only a large and essentially unconstrained vocabulary of categories as prior information.

Language Modelling Object Recognition

Paper
Add Code

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

no code implementations • 5 Apr 2023 • Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su

This paper proposes a method for generating images of customized objects specified by users.

Caption Generation Image Generation +1

Paper
Add Code

Learning to Adapt to Online Streams with Distribution Shifts

no code implementations • 2 Mar 2023 • Chenyan Wu, Yimu Pan, Yandong Li, James Z. Wang

Test-time adaptation (TTA) is a technique used to reduce distribution gaps between the training and testing sets by leveraging unlabeled test data during inference.

Benchmarking Meta-Learning +3

Paper
Add Code

Train-Once-for-All Personalization

no code implementations • CVPR 2023 • Hong-You Chen, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang

We study the problem of how to train a "personalization-friendly" model such that given only the task descriptions, the model can be adapted to different end-users' needs, e. g., for accurately classifying different subsets of objects.

Paper
Add Code

MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

no code implementations • 25 May 2022 • Chenyan Wu, Yandong Li, Xianfeng Tang, James Wang

Our method works like the following: First, to model the multi-human environment, it processes multi-human 2D poses and builds a novel heterogeneous graph, where nodes from different people and within one person are connected to capture inter-human interactions and draw the body geometry (i. e., skeleton and mesh structure).

Ranked #5 on 3D Multi-Person Pose Estimation on MuPoTS-3D

3D Multi-Person Human Pose Estimation 3D Multi-Person Pose Estimation

Paper
Add Code

Dir-MUSIC Algorithm for DOA Estimation of Partial Discharge Based on Signal Strength represented by Antenna Gain Array Manifold

no code implementations • 19 Apr 2022 • Wencong Xu, Yandong Li, Bingshu Chen, Yue Hu, Jianxu Li, Zijing Zeng

The experimental results show that the PD direction-finding error is 3. 39{\deg}, which can meet the need for Partial discharge DOA estimation using inspection robots in substations.

Paper
Add Code

Rethinking Deep Face Restoration

no code implementations • CVPR 2022 • Yang Zhao, Yu-Chuan Su, Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou Chen, Xuhui Jia

While existing approaches for face restoration make significant progress in generating high-quality faces, they often fail to preserve facial features and cannot authentically reconstruct the faces.

Face Generation Face Reconstruction

Paper
Add Code

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

1 code implementation • NeurIPS 2021 • Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size.

Instance Segmentation Long-tailed Object Detection +4

Paper
Code

MoViNets: Mobile Video Networks for Efficient Video Recognition

3 code implementations • CVPR 2021 • Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Ranked #3 on Action Classification on Charades

Action Classification Action Recognition +4

76,571

Paper
Code

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

1 code implementation • ICCV 2021 • Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Many objects do not appear frequently enough in complex scenes (e. g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e. g., in product images).

Imputation Instance Segmentation +5

Paper
Code

Ranking Neural Checkpoints

1 code implementation • CVPR 2021 • Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, Boqing Gong

This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task.

Ranked #6 on Transferability on classification benchmark

Transferability Transfer Learning

32,732

Paper
Code

Improving Object Detection with Selective Self-supervised Self-training

no code implementations • ECCV 2020 • Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets.

Image Classification Image Retrieval +4

Paper
Add Code

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model

1 code implementation • CVPR 2020 • Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

The other is that the number of images used for the knowledge distillation should be small; otherwise, it violates our expectation of reducing the dependence on large-scale datasets.

Active Learning Knowledge Distillation

Paper
Code

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

1 code implementation • CVPR 2020 • Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu

We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.

Generative Adversarial Network Hallucination +4

Paper
Code

Attacking Lifelong Learning Models with Gradient Reversion

no code implementations • ICLR 2020 • Yunhui Guo, Mingrui Liu, Yandong Li, Liqiang Wang, Tianbao Yang, Tajana Rosing

We evaluate the effectiveness of traditional attack methods such as FGSM and PGD. The results show that A-GEM still possesses strong continual learning ability in the presence of adversarial examples in the memory and simple defense techniques such as label smoothing can further alleviate the adversarial effects.

Continual Learning

Paper
Add Code

AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning

no code implementations • 21 Nov 2019 • Yunhui Guo, Yandong Li, Liqiang Wang, Tajana Rosing

Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task.

General Classification Image Classification +1

Paper
Add Code

Transferring Robustness for Graph Neural Network Against Poisoning Attacks

1 code implementation • 20 Aug 2019 • Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, Suhang Wang

To optimize PA-GNN for a poisoned graph, we design a meta-optimization algorithm that trains PA-GNN to penalize perturbations using clean graphs and their adversarial counterparts, and transfers such ability to improve the robustness of PA-GNN on the poisoned graph.

Ranked #25 on Node Classification on Pubmed

Node Classification Transfer Learning

Paper
Code

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

1 code implementation • 1 May 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques.

Adversarial Attack

Paper
Code

NATTACK: A STRONG AND UNIVERSAL GAUSSIAN BLACK-BOX ADVERSARIAL ATTACK

no code implementations • ICLR 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

In other words, there is a population of adversarial examples, instead of only one, for any input to a DNN.

Adversarial Attack

Paper
Add Code

Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

no code implementations • 25 Feb 2019 • Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems.

Graph Embedding

Paper
Add Code

Depthwise Convolution is All You Need for Learning Multiple Visual Domains

1 code implementation • 3 Feb 2019 • Yunhui Guo, Yandong Li, Rogerio Feris, Liqiang Wang, Tajana Rosing

A model aware of the relationships between different domains can also be trained to work on new domains with less resources.

Ranked #2 on Continual Learning on visual domain decathlon (10 tasks)

Continual Learning

Paper
Code

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition Temporal Action Localization

334

Paper
Code

How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization

no code implementations • ECCV 2018 • Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong

The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity.

Point Processes Supervised Video Summarization

Paper
Add Code

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

1 code implementation • ICCV 2017 • Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).

Language Modelling Multiple-choice +4

Paper
Code

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

no code implementations • 12 Aug 2017 • Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin

Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.

Ranked #163 on Action Classification on Kinetics-400

Action Classification General Classification +2

Paper
Add Code

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

1 code implementation • 14 Jul 2017 • Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.

Video Recognition Video Understanding

113

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.