Search Results for author: Dahua Lin

Found 243 papers, 144 papers with code

Region Proposal by Guided Anchoring

2 code implementations • CVPR 2019 • Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin

State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios.

Ranked #1 on Region Proposal on COCO test-dev

object-detection Object Detection +1

27,716

Paper
Code

Hybrid Task Cascade for Instance Segmentation

5 code implementations • CVPR 2019 • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Ranked #32 on Object Detection on COCO-O

Instance Segmentation object-detection +4

27,716

Paper
Code

Libra R-CNN: Towards Balanced Learning for Object Detection

6 code implementations • CVPR 2019 • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

Ranked #149 on Object Detection on COCO test-dev

object-detection Object Detection

27,716

Paper
Code

Prime Sample Attention in Object Detection

1 code implementation • CVPR 2020 • Yuhang Cao, Kai Chen, Chen Change Loy, Dahua Lin

Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector.

Object object-detection +1

27,716

Paper
Code

CARAFE: Content-Aware ReAssembly of FEatures

3 code implementations • ICCV 2019 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

CARAFE introduces little computational overhead and can be readily integrated into modern network architectures.

Ranked #3 on Feature Upsampling on ImageNet

Feature Upsampling Instance Segmentation +3

27,716

Paper
Code

MMDetection: Open MMLab Detection Toolbox and Benchmark

144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In this paper, we introduce the various features of this toolbox.

Benchmarking Instance Segmentation +2

27,716

Paper
Code

Side-Aware Boundary Localization for More Precise Object Detection

3 code implementations • ECCV 2020 • Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.

Object object-detection +2

27,716

Paper
Code

Seesaw Loss for Long-Tailed Instance Segmentation

2 code implementations • CVPR 2021 • Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin

Instances of head classes dominate a long-tailed dataset and they serve as negative samples of tail categories.

Instance Segmentation Semantic Segmentation

27,716

Paper
Code

Feature Pyramid Grids

1 code implementation • 7 Apr 2020 • Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.

Neural Architecture Search object-detection +2

27,714

Paper
Code

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

4 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai

Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.

Image Animation

8,701

Paper
Code

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

1 code implementation • 28 Nov 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai

The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.

Video Generation

8,701

Paper
Code

PSANet: Point-wise Spatial Attention Network for Scene Parsing

4 code implementations • ECCV 2018 • Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia

We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes.

Ranked #51 on Semantic Segmentation on Cityscapes test

Position Scene Parsing +1

7,374

Paper
Code

InternLM2 Technical Report

1 code implementation • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

Ranked #5 on Long-Context Understanding on Ada-LEval (BestAnswer)

4k Long-Context Understanding

5,137

Paper
Code

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

8 code implementations • 22 Apr 2021 • Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D.

Ranked #323 on 3D Object Detection on nuScenes

Autonomous Driving Monocular 3D Object Detection +2

4,785

Paper
Code

Probabilistic and Geometric Depth: Detecting Objects in Perspective

1 code implementation • 29 Jul 2021 • Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty.

Ranked #10 on 3D Object Detection on KITTI Cars Hard val

Attribute Depth Estimation +2

4,785

Paper
Code

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

2 code implementations • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

4,059

Paper
Code

Temporal Action Detection with Structured Segment Networks

6 code implementations • ICCV 2017 • Yue Zhao, Yuanjun Xiong, Li-Min Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin

Detecting actions in untrimmed videos is an important yet challenging task.

Ranked #6 on Action Recognition on THUMOS’14

Action Detection Action Recognition +1

3,876

Paper
Code

Temporal Segment Networks for Action Recognition in Videos

11 code implementations • 8 May 2017 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #5 on Video Classification on COIN

Action Classification Action Recognition In Videos +3

3,876

Paper
Code

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

19 code implementations • 2 Aug 2016 • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc van Gool

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Ranked #3 on Multimodal Activity Recognition on EV-Action

Action Classification Action Recognition In Videos +2

3,876

Paper
Code

Omni-sourced Webly-supervised Learning for Video Recognition

3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.

Ranked #5 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Recognition +1

3,876

Paper
Code

Revisiting Skeleton-based Action Recognition

4 code implementations • CVPR 2022 • Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons.

Ranked #1 on Action Recognition on NTU RGB+D

Group Activity Recognition Pose Estimation +1

3,876

Paper
Code

Temporal RoI Align for Video Object Recognition

1 code implementation • 8 Sep 2021 • Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.

Ranked #1 on Video Instance Segmentation on YouTube-VIS

Instance Segmentation Object +5

3,372

Paper
Code

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

14 code implementations • 5 May 2018 • Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

Ranked #13 on Contrastive Learning on imagenet-1k

Contrastive Learning General Classification +3

3,227

Paper
Code

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

3,139

Paper
Code

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

1 code implementation • ICCV 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin

There has been significant progress in Masked Image Modeling (MIM).

Semantic Segmentation

3,139

Paper
Code

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

4 code implementations • CVPR 2018 • Zhirong Wu, Yuanjun Xiong, Stella X. Yu, Dahua Lin

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.

Ranked #40 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (Top 5 Accuracy metric)

General Classification object-detection +4

3,078

Paper
Code

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

1 code implementation • 4 Mar 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.

Self-Supervised Learning

3,078

Paper
Code

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

3 code implementations • CVPR 2017 • Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin

A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition.

Image Classification

2,918

Paper
Code

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

24 code implementations • 23 Jan 2018 • Sijie Yan, Yuanjun Xiong, Dahua Lin

Dynamics of human body skeletons convey significant information for human action recognition.

Ranked #2 on Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton

3D Human Pose Estimation Action Recognition +3

2,851

Paper
Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,842

Paper
Code

MMBench: Is Your Multi-modal Model an All-around Player?

2 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin

In response to these challenges, we propose MMBench, a novel multi-modality benchmark.

Ranked #1 on Visual Question Answering on MMBench

Visual Question Answering

2,440

Paper
Code

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

1 code implementation • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.

Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval

Image Comprehension Reading Comprehension +1

1,570

Paper
Code

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

1 code implementation • 21 Nov 2023 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Ranked #1 on visual instruction following on LLaVA-Bench

Descriptive visual instruction following +2

1,570

Paper
Code

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.

Ranked #16 on Visual Question Answering on MM-Vet

Language Modelling Visual Question Answering

1,570

Paper
Code

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

1 code implementation • 22 Feb 2024 • Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

1,570

Paper
Code

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

Ranked #11 on Visual Question Answering on MM-Vet

4k Language Modelling +1

1,570

Paper
Code

PYSKL: Towards Good Practices for Skeleton Action Recognition

1 code implementation • 19 May 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN.

Ranked #19 on Skeleton Based Action Recognition on NTU RGB+D 120

Action Recognition Skeleton Based Action Recognition

853

Paper
Code

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

3 code implementations • 12 Oct 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

Graph convolution networks (GCN) have been widely used in skeleton-based action recognition.

Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Skeleton Based Action Recognition

853

Paper
Code

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

3 code implementations • 4 Aug 2020 • Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.

Ranked #11 on LIDAR Semantic Segmentation on nuScenes

3D Semantic Segmentation LIDAR Semantic Segmentation

807

Paper
Code

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

2 code implementations • CVPR 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin

However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited.

Ranked #2 on 3D Semantic Segmentation on ScribbleKITTI

LIDAR Semantic Segmentation Panoptic Segmentation +3

807

Paper
Code

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception

1 code implementation • 12 Sep 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, Dahua Lin

In this paper, we benchmark our model on these three tasks.

Panoptic Segmentation Segmentation

807

Paper
Code

Self-Supervised Scene De-occlusion

2 code implementations • CVPR 2020 • Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, Chen Change Loy

This is achieved via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner.

Image Manipulation Scene Understanding

770

Paper
Code

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation +1

719

Paper
Code

Learning to Cluster Faces on an Affinity Graph

3 code implementations • CVPR 2019 • Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin

Face recognition sees remarkable progress in recent years, and its performance has reached a very high level.

Clustering Face Recognition +1

698

Paper
Code

Learning to Cluster Faces via Confidence and Connectivity Estimation

3 code implementations • CVPR 2020 • Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin

With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.

Clustering Connectivity Estimation +2

698

Paper
Code

A Pursuit of Temporal Accuracy in General Activity Detection

1 code implementation • 8 Mar 2017 • Yuanjun Xiong, Yue Zhao, Li-Min Wang, Dahua Lin, Xiaoou Tang

Detecting activities in untrimmed videos is an important but challenging task.

Ranked #29 on Temporal Action Localization on ActivityNet-1.3

Action Detection Activity Detection +2

641

Paper
Code

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

1 code implementation • 4 Mar 2024 • Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.

3D Generation Text to 3D +1

546

Paper
Code

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

3D Generation

485

Paper
Code

Scene as Occupancy

2 code implementations • ICCV 2023 • Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

Human driver can easily describe the complex traffic scene by visual system.

Motion Planning

483

Paper
Code

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

1 code implementation • ECCV 2020 • Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, Ping Luo

Learning a good image prior is a long-term goal for image restoration and manipulation.

Generative Adversarial Network Image Manipulation +2

473

Paper
Code

Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition

4 code implementations • ECCV 2018 • Xiaohang Zhan, Ziwei Liu, Junjie Yan, Dahua Lin, Chen Change Loy

Face recognition has witnessed great progress in recent years, mainly attributed to the high-capacity model designed and the abundant labeled data collected.

Face Recognition

453

Paper
Code

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

1 code implementation • 30 Nov 2023 • Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, LiMin Wang, Dahua Lin, Bo Dai

Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications.

Neural Rendering

451

Paper
Code

Optimizing Video Object Detection via a Scale-Time Lattice

1 code implementation • CVPR 2018 • Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin

High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e. g. those that require detecting objects from video streams in real time.

Object object-detection +1

449

Paper
Code

OneLLM: One Framework to Align All Modalities with Language

1 code implementation • 6 Dec 2023 • Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

In detail, we first train an image projection module to connect a vision encoder with LLM.

Ranked #73 on Visual Question Answering on MM-Vet

Question Answering Visual Question Answering

441

Paper
Code

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

1 code implementation • CVPR 2023 • Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu

Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases.

Novel View Synthesis Object +1

418

Paper
Code

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

1 code implementation • 26 Aug 2022 • Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin

Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene.

Surface Reconstruction

399

Paper
Code

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

1 code implementation • 21 Aug 2023 • Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin

The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).

398

Paper
Code

PointLLM: Empowering Large Language Models to Understand Point Clouds

3 code implementations • 31 Aug 2023 • Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

Ranked #3 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Object Classification 3D Question Answering (3D-QA) +2

380

Paper
Code

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

2 code implementations • 6 Dec 2023 • Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.

Autonomous Driving

359

Paper
Code

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

1 code implementation • ECCV 2020 • Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, Dahua Lin

We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions.

Ranked #7 on Long-tail Learning on VOC-MLT

Binary Classification General Classification +2

349

Paper
Code

Monocular 3D Object Detection with Depth from Motion

1 code implementation • 26 Jul 2022 • Tai Wang, Jiangmiao Pang, Dahua Lin

Perceiving 3D objects from monocular inputs is crucial for robotic systems, given its economy compared to multi-sensor settings.

Depth Estimation Monocular 3D Object Detection +2

295

Paper
Code

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

1 code implementation • 26 Dec 2023 • Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions.

Scene Understanding

293

Paper
Code

Real or Not Real, that is the Question

2 code implementations • ICLR 2020 • Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, Dahua Lin

While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles.

286

Paper
Code

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

267

Paper
Code

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

2 code implementations • CVPR 2022 • Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin

End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition.

Ranked #3 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

253

Paper
Code

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

1 code implementation • 5 Dec 2023 • Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation.

3D Generation Reading Comprehension

251

Paper
Code

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

1 code implementation • 2 Aug 2016 • Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.

General Classification Video Classification

250

Paper
Code

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

1 code implementation • NeurIPS 2023 • Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Kwan-Yee Lin

It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees.

2k Image Matting +2

214

Paper
Code

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

4 code implementations • CVPR 2020 • Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.

Action Recognition Scene Segmentation +1

212

Paper
Code

Detecting Visual Relationships with Deep Relational Networks

1 code implementation • CVPR 2017 • Bo Dai, Yuqi Zhang, Dahua Lin

Relationships among objects play a crucial role in image understanding.

Ranked #3 on Visual Relationship Detection on VRD Phrase Detection

General Classification

200

Paper
Code

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

1 code implementation • ICCV 2023 • Wei Cheng, Ruixiang Chen, Wanqi Yin, Siming Fan, Keyu Chen, Honglin He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, Kwan-Yee Lin

Realistic human-centric rendering plays a key role in both computer vision and computer graphics.

Camera Calibration Novel View Synthesis

199

Paper
Code

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

1 code implementation • 9 Feb 2024 • Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.

Data Augmentation GSM8K +3

190

Paper
Code

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

1 code implementation • 19 Mar 2024 • Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao

Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.

Hallucination

187

Paper
Code

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

1 code implementation • 22 Dec 2023 • Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.

182

Paper
Code

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

1 code implementation • 8 Jan 2024 • Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences.

3D Generation Text to 3D

173

Paper
Code

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

1 code implementation • ICCV 2023 • Zhitao Yang, Zhongang Cai, Haiyi Mei, Shuai Liu, Zhaoxi Chen, Weiye Xiao, Yukun Wei, Zhongfei Qing, Chen Wei, Bo Dai, Wayne Wu, Chen Qian, Dahua Lin, Ziwei Liu, Lei Yang

Synthetic data has emerged as a promising source for 3D human research as it offers low-cost access to large-scale human datasets.

Human Mesh Recovery Neural Rendering

169

Paper
Code

UntrimmedNets for Weakly Supervised Action Recognition and Detection

2 code implementations • CVPR 2017 • Limin Wang, Yuanjun Xiong, Dahua Lin, Luc van Gool

We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet.

Ranked #3 on Action Classification on ActivityNet-1.2

Weakly Supervised Action Localization Weakly-Supervised Action Recognition

163

Paper
Code

Person Search in Videos with One Portrait Through Visual and Temporal Links

2 code implementations • ECCV 2018 • Qingqiu Huang, Wentao Liu, Dahua Lin

In real-world applications, e. g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait.

Person Re-Identification Person Search +2

158

Paper
Code

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

1 code implementation • 29 Nov 2023 • Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.

Hallucination

155

Paper
Code

Vision Transformer with Progressive Sampling

1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

147

Paper
Code

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

1 code implementation • 21 Dec 2023 • Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu, Miao Zheng, Jingming Zhuo, Songyang Zhang, Dahua Lin, Kai Chen, Feng Zhao

Based on that, we further introduce T-Eval to evaluate the tool utilization capability step by step.

Instruction Following Retrieval

147

Paper
Code

DSNAS: Direct Neural Architecture Search without Parameter Retraining

1 code implementation • CVPR 2020 • Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy.

Ranked #16 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Neural Architecture Search

144

Paper
Code

Understanding the wiring evolution in differentiable neural architecture search

1 code implementation • 2 Sep 2020 • Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods.

Neural Architecture Search

144

Paper
Code

Self-Supervised Learning via Conditional Motion Propagation

1 code implementation • CVPR 2019 • Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy

Instead of explicitly modeling the motion probabilities, we design the pretext task as a conditional motion propagation problem.

Human Parsing Instance Segmentation +2

137

Paper
Code

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation • NeurIPS 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

133

Paper
Code

Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation • 24 Nov 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

Point Cloud Completion

133

Paper
Code

SPTS: Single-Point Text Spotting

1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Ranked #3 on Text Spotting on SCUT-CTW1500

Language Modelling Text Detection +1

126

Paper
Code

SPTS v2: Single-Point Scene Text Spotting

3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Ranked #15 on Text Spotting on ICDAR 2015

Text Detection Text Spotting

126

Paper
Code

When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks

1 code implementation • CVPR 2020 • Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin

Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks.

Neural Architecture Search

123

Paper
Code

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion

1 code implementation • ICLR 2022 • Zhaoyang Lyu, Zhifeng Kong, Xudong Xu, Liang Pan, Dahua Lin

The RFNet refines the coarse output of the CGNet and further improves quality of the completed point cloud.

Denoising Point Cloud Completion

123

Paper
Code

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

1 code implementation • 14 Sep 2023 • Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang

Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution.

Language Modelling Large Language Model

117

Paper
Code

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis

1 code implementation • NeurIPS 2021 • Xudong Xu, Xingang Pan, Dahua Lin, Bo Dai

In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence.

3D-Aware Image Synthesis Object

103

Paper
Code

Adversarial Robustness under Long-Tailed Distribution

1 code implementation • CVPR 2021 • Tong Wu, Ziwei Liu, Qingqiu Huang, Yu Wang, Dahua Lin

We then perform a systematic study on existing long-tailed recognition methods in conjunction with the adversarial training framework.

Adversarial Robustness

Paper
Code

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

1 code implementation • 20 Oct 2023 • Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen

In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.

Instruction Following

Paper
Code

Are We on the Right Way for Evaluating Large Vision-Language Models?

1 code implementation • 29 Mar 2024 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao

We evaluate 16 leading LVLMs on MMStar to assess their multi-modal capabilities, and on 7 benchmarks with the proposed metrics to investigate their data leakage and actual multi-modal gain.

World Knowledge

Paper
Code

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

1 code implementation • 27 Feb 2024 • Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang

We present SongComposer, an innovative LLM designed for song composition.

Instruction Following Language Modelling +1

Paper
Code

Towards Diverse and Natural Image Descriptions via a Conditional GAN

1 code implementation • ICCV 2017 • Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e. g. those based on RNNs, are often overly rigid and lacking in variability.

Image Captioning

Paper
Code

Deep Markov Random Field for Image Modeling

1 code implementation • 7 Sep 2016 • Zhirong Wu, Dahua Lin, Xiaoou Tang

Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power.

Paper
Code

Position-Guided Point Cloud Panoptic Segmentation Transformer

1 code implementation • 23 Mar 2023 • Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception.

Ranked #1 on Panoptic Segmentation on SemanticKITTI

Instance Segmentation Panoptic Segmentation +3

Paper
Code

Scaling Laws of RoPE-based Extrapolation

1 code implementation • 8 Oct 2023 • Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin

The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest.

16k

Paper
Code

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking

1 code implementation • 29 Mar 2023 • Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang

Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation.

3D Object Detection Depth Estimation +3

Paper
Code

Few-Shot Object Detection via Association and DIscrimination

1 code implementation • NeurIPS 2021 • Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

Paper
Code

DevBench: A Comprehensive Benchmark for Software Development

1 code implementation • 13 Mar 2024 • Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.

Code Generation

Paper
Code

InterControl: Generate Human Motion Interactions by Controlling Every Joint

1 code implementation • 27 Nov 2023 • Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai

Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM).

Language Modelling Large Language Model +1

Paper
Code

From Trailers to Storylines: An Efficient Way to Learn from Movies

1 code implementation • 14 Jun 2018 • Qingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin

Experiments on this dataset showed that the proposed method can substantially reduce the training time while obtaining highly effective features and coherent temporal structures.

Paper
Code

Unifying Identification and Context Learning for Person Recognition

1 code implementation • CVPR 2018 • Qingqiu Huang, Yu Xiong, Dahua Lin

In this work, we aim to move beyond such limitations and propose a new framework to leverage context for person recognition.

Face Recognition Person Recognition

Paper
Code

CLEVA: Chinese Language Models EVAluation Platform

1 code implementation • 9 Aug 2023 • Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, LiWei Wang

With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue.

Paper
Code

SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds

1 code implementation • 6 Apr 2020 • Xinge Zhu, Yuexin Ma, Tai Wang, Yan Xu, Jianping Shi, Dahua Lin

Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds.

3D Object Detection object-detection

Paper
Code

Online Hyper-parameter Learning for Auto-Augmentation Strategy

1 code implementation • ICCV 2019 • Chen Lin, Minghao Guo, Chuming Li, Yuan Xin, Wei Wu, Dahua Lin, Wanli Ouyang, Junjie Yan

Data augmentation is critical to the success of modern deep learning techniques.

Data Augmentation

Paper
Code

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

1 code implementation • CVPR 2023 • Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset.

3D Object Detection object-detection

Paper
Code

SpotServe: Serving Generative Large Language Models on Preemptible Instances

1 code implementation • 27 Nov 2023 • Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.

Graph Matching

Paper
Code

POPQORN: Quantifying Robustness of Recurrent Neural Networks

2 code implementations • 17 May 2019 • Ching-Yun Ko, Zhaoyang Lyu, Tsui-Wei Weng, Luca Daniel, Ngai Wong, Dahua Lin

The vulnerability to adversarial attacks has been a critical issue for deep neural networks.

Paper
Code

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

1 code implementation • 7 Feb 2024 • Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, WangMeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount.

Multiple-choice

Paper
Code

Guided Diffusion Model for Adversarial Purification

2 code implementations • 30 May 2022 • Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, Hongfei Fu

In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks.

Denoising

Paper
Code

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

1 code implementation • 9 Apr 2024 • Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen

Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.

Answer Selection Long-Context Understanding

Paper
Code

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

1 code implementation • 20 Mar 2024 • Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.

Contrastive Learning Fine-Grained Visual Recognition +3

Paper
Code

Accelerating Diffusion Models via Early Stop of the Diffusion Process

1 code implementation • 25 May 2022 • Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, Bo Dai

By modeling the reverse process of gradually diffusing the data distribution into a Gaussian distribution, generating a sample in DDPMs can be regarded as iteratively denoising a randomly sampled Gaussian noise.

Denoising Image Generation

Paper
Code

Multi-Level Logit Distillation

1 code implementation • CVPR 2023 • Ying Jin, Jiaqi Wang, Dahua Lin

Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously.

Knowledge Distillation

Paper
Code

Characterization of Large Language Model Development in the Datacenter

1 code implementation • 12 Mar 2024 • Qinghao Hu, Zhisheng Ye, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang

Large Language Models (LLMs) have presented impressive performance across several transformative tasks.

Language Modelling Large Language Model +1

Paper
Code

Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant

1 code implementation • NIPS 2022 • Ying Jin, Jiaqi Wang, Dahua Lin

Semi-Supervised Semantic Segmentation aims at training the segmentation model with limited labeled data and a large amount of unlabeled data.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Code

Motion Guided 3D Pose Estimation from Videos

1 code implementation • ECCV 2020 • Jingbo Wang, Sijie Yan, Yuanjun Xiong, Dahua Lin

We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.

Ranked #19 on 3D Human Pose Estimation on Human3.6M

3D Pose Estimation Monocular 3D Human Pose Estimation

Paper
Code

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

1 code implementation • 6 Feb 2024 • Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.

16k

Paper
Code

LongWanjuan: Towards Systematic Measurement for Long Text Quality

1 code implementation • 21 Feb 2024 • Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

The quality of training data are crucial for enhancing the long-text capabilities of foundation models.

Language Modelling

Paper
Code

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

1 code implementation • 29 Nov 2023 • Shuangrui Ding, Rui Qian, Haohang Xu, Dahua Lin, Hongkai Xiong

In this paper, we propose a simple yet effective approach for self-supervised video object segmentation (VOS).

Clustering Object +6

Paper
Code

CriticBench: Evaluating Large Language Models as Critic

1 code implementation • 21 Feb 2024 • Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao

Critique ability are crucial in the scalable oversight and self-improvement of Large Language Models (LLMs).

Paper
Code

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

1 code implementation • CVPR 2022 • Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin

To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.

Action Recognition Representation Learning +3

Paper
Code

Towards Evaluating and Training Verifiably Robust Neural Networks

1 code implementation • CVPR 2021 • Zhaoyang Lyu, Minghao Guo, Tong Wu, Guodong Xu, Kehuan Zhang, Dahua Lin

Recent works have shown that interval bound propagation (IBP) can be used to train verifiably robust neural networks.

Paper
Code

OCSampler: Compressing Videos to One Clip with Single-step Sampling

1 code implementation • CVPR 2022 • Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, LiMin Wang

Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step.

Video Recognition

Paper
Code

Policy Continuation with Hindsight Inverse Dynamics

1 code implementation • NeurIPS 2019 • Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou

This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning.

Reinforcement Learning (RL)

Paper
Code

Flames: Benchmarking Value Alignment of LLMs in Chinese

1 code implementation • 12 Nov 2023 • Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin

The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values.

Benchmarking Fairness

Paper
Code

A Neural Compositional Paradigm for Image Captioning

1 code implementation • NeurIPS 2018 • Bo Dai, Sanja Fidler, Dahua Lin

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.

Image Captioning

Paper
Code

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin

Deep learning models have achieved excellent recognition results on large-scale video benchmarks.

Action Recognition

Paper
Code

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

1 code implementation • 21 Jan 2024 • Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences.

Paper
Code

Fastened CROWN: Tightened Neural Network Robustness Certificates

1 code implementation • 2 Dec 2019 • Zhaoyang Lyu, Ching-Yun Ko, Zhifeng Kong, Ngai Wong, Dahua Lin, Luca Daniel

We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints.

Paper
Code

Static and Dynamic Concepts for Self-supervised Video Representation Learning

1 code implementation • 26 Jul 2022 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

In this paper, we propose a novel learning scheme for self-supervised video representation learning.

Representation Learning Video Understanding

Paper
Code

Recursive Visual Sound Separation Using Minus-Plus Net

1 code implementation • ICCV 2019 • Xudong Xu, Bo Dai, Dahua Lin

Sounds provide rich semantics, complementary to visual data, for many tasks.

Paper
Code

IRLAS: Inverse Reinforcement Learning for Architecture Search

1 code implementation • CVPR 2019 • Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan

Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt).

Neural Architecture Search reinforcement-learning +1

Paper
Code

F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods

1 code implementation • 26 Jan 2024 • Yu Sun, Keyu Chen, Shujie Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

However, these evaluation benchmarks are limited to assessing the instruction-following capabilities, overlooking the fundamental abilities that emerge during the pre-training stage.

Instruction Following

Paper
Code

Discover and Learn New Objects from Documentaries

1 code implementation • CVPR 2017 • Kai Chen, Hang Song, Chen Change Loy, Dahua Lin

Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task.

Object Weakly-supervised Learning

Paper
Code

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

1 code implementation • ICCV 2023 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

In the second stage, for each semantics, we randomly sample slots from the corresponding Gaussian distribution and perform masked feature aggregation within the semantic area to exploit temporal correspondence patterns for instance identification.

Object Object Discovery +1

Paper
Code

An Embarrassingly Simple Approach for Knowledge Distillation

1 code implementation • 5 Dec 2018 • Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.

Face Recognition Knowledge Distillation +3

Paper
Code

Evolutionary Stochastic Policy Distillation

1 code implementation • 27 Apr 2020 • Hao Sun, Xinyu Pan, Bo Dai, Dahua Lin, Bolei Zhou

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals.

Paper
Code

Novel Policy Seeking with Constrained Optimization

1 code implementation • 21 May 2020 • Hao Sun, Zhenghao Peng, Bo Dai, Jian Guo, Dahua Lin, Bolei Zhou

In problem-solving, we humans can come up with multiple novel solutions to the same problem.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Peephole: Predicting Network Performance Before Training

1 code implementation • 9 Dec 2017 • Boyang Deng, Junjie Yan, Dahua Lin

The quest for performant networks has been a significant force that drives the advancements of deep learning in recent years.

Paper
Code

Low-Latency Video Semantic Segmentation

no code implementations • CVPR 2018 • Yule Li, Jianping Shi, Dahua Lin

Recent years have seen remarkable progress in semantic segmentation.

Ranked #6 on Video Semantic Segmentation on Cityscapes val

Autonomous Driving Segmentation +3

Paper
Add Code

Accelerated Training for Massive Classification via Dynamic Class Selection

no code implementations • 5 Jan 2018 • Xingcheng Zhang, Lei Yang, Junjie Yan, Dahua Lin

Massive classification, a classification task defined over a vast number of classes (hundreds of thousands or even millions), has become an essential part of many real-world systems, such as face recognition.

Classification Face Recognition +1

Paper
Add Code

Learning Sparse Visual Representations with Leaky Capped Norm Regularizers

no code implementations • 8 Nov 2017 • Jianqiao Wangni, Dahua Lin

To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.

Paper
Add Code

Be Your Own Prada: Fashion Synthesis with Structural Coherence

no code implementations • ICCV 2017 • Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, Chen Change Loy

In the second stage, a generative model with a newly proposed compositional mapping layer is used to render the final image with precise regions and textures conditioned on this map.

Fashion Synthesis Semantic Segmentation +1

Paper
Add Code

Contrastive Learning for Image Captioning

no code implementations • NeurIPS 2017 • Bo Dai, Dahua Lin

Specifically, via two constraints formulated on top of a reference model, the proposed method can encourage distinctiveness, while maintaining the overall quality of the generated captions.

Contrastive Learning Image Captioning

Paper
Add Code

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data

no code implementations • 19 Sep 2017 • Ruohui Wang, Dahua Lin

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes.

Paper
Add Code

Integrating Specialized Classifiers Based on Continuous Time Markov Chain

no code implementations • 7 Sep 2017 • Zhizhong Li, Dahua Lin

Specialized classifiers, namely those dedicated to a subset of classes, are often adopted in real-world recognition systems.

Paper
Add Code

Adjustable Bounded Rectifiers: Towards Deep Binary Representations

no code implementations • 19 Nov 2015 • Zhirong Wu, Dahua Lin, Xiaoou Tang

This suggests that the semantic structure of a neural network may be manifested through a guided binarization process.

Binarization

Paper
Add Code

Generating Multi-Sentence Lingual Descriptions of Indoor Scenes

no code implementations • 28 Feb 2015 • Dahua Lin, Chen Kong, Sanja Fidler, Raquel Urtasun

This paper proposes a novel framework for generating lingual descriptions of indoor scenes.

Sentence Text Generation

Paper
Add Code

Move Forward and Tell: A Progressive Generator of Video Descriptions

no code implementations • ECCV 2018 • Yilei Xiong, Bo Dai, Dahua Lin

We present an efficient framework that can generate a coherent paragraph to describe a given video.

Descriptive Sentence +1

Paper
Add Code

Rethinking the Form of Latent States in Image Captioning

no code implementations • ECCV 2018 • Bo Dai, Deming Ye, Dahua Lin

Taking advantage of this, we visually reveal the internal dynamics in the process of caption generation, as well as the connections between input visual domain and output linguistic domain.

Caption Generation Image Captioning

Paper
Add Code

Pose Guided Human Video Generation

no code implementations • ECCV 2018 • Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, Dahua Lin

Human pose, on the other hand, can represent motion patterns intrinsically and interpretably, and impose the geometric constraints regardless of appearance.

Generative Adversarial Network motion prediction +1

Paper
Add Code

Generative Adversarial Frontal View to Bird View Synthesis

no code implementations • 1 Aug 2018 • Xinge Zhu, Zhichao Yin, Jianping Shi, Hongsheng Li, Dahua Lin

Due to the large gap and severe deformation between the frontal view and bird view, generating a bird view image from a single frontal view is challenging.

Bird View Synthesis Homography Estimation +1

Paper
Add Code

Probabilistic Ensemble of Collaborative Filters

no code implementations • 26 Jun 2018 • Zhiyu Min, Dahua Lin

Collaborative filtering is an important technique for recommendation.

Collaborative Filtering

Paper
Add Code

Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation

no code implementations • ECCV 2018 • Xinge Zhu, Hui Zhou, Ceyuan Yang, Jianping Shi, Dahua Lin

Due to the expensive and time-consuming annotations (e. g., segmentation) for real-world images, recent works in computer vision resort to synthetic data.

Domain Adaptation Segmentation +1

Paper
Add Code

Improving On-policy Learning with Statistical Reward Accumulation

no code implementations • 7 Sep 2018 • Yubin Deng, Ke Yu, Dahua Lin, Xiaoou Tang, Chen Change Loy

Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns.

Atari Games

Paper
Add Code

Trajectory Convolution for Action Recognition

no code implementations • NeurIPS 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin

How to leverage the temporal dimension is a key question in video analysis.

Action Recognition Temporal Action Localization

Paper
Add Code

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

no code implementations • NeurIPS 2013 • Dahua Lin

To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models.

Paper
Add Code

Coupling Nonparametric Mixtures via Latent Dirichlet Processes

no code implementations • NeurIPS 2012 • Dahua Lin, John W. Fisher

Mixture distributions are often used to model complex data.

Paper
Add Code

Learning Globally Optimized Object Detector via Policy Gradient

no code implementations • CVPR 2018 • Yongming Rao, Dahua Lin, Jiwen Lu, Jie zhou

In this paper, we propose a simple yet effective method to learn globally optimized detector for object detection, which is a simple modification to the standard cross-entropy gradient inspired by the REINFORCE algorithm.

Object object-detection +1

Paper
Add Code

Recognize Actions by Disentangling Components of Dynamics

no code implementations • CVPR 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin

Despite the remarkable progress in action recognition over the past several years, existing methods remain limited in efficiency and effectiveness.

Action Recognition Optical Flow Estimation +2

Paper
Add Code

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

no code implementations • ECCV 2018 • Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin

The thriving of video sharing services brings new challenges to video retrieval, e. g. the rapid growth in video duration and content diversity.

Natural Language Queries Retrieval +2

Paper
Add Code

Lifelong Learning via Progressive Distillation and Retrospection

no code implementations • ECCV 2018 • Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin

Lifelong learning aims at adapting a learned model to new tasks while retaining the knowledge gained earlier.

Knowledge Distillation

Paper
Add Code

Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis

no code implementations • 29 Dec 2018 • Jianqiao Wangni, Dahua Lin, Ji Liu, Kostas Daniilidis, Jianbo Shi

For recovering 3D object poses from 2D images, a prevalent method is to pre-train an over-complete dictionary $\mathcal D=\{B_i\}_i^D$ of 3D basis poses.

Paper
Add Code

Visual Semantic Search: Retrieving Videos via Complex Textual Queries

no code implementations • CVPR 2014 • Dahua Lin, Sanja Fidler, Chen Kong, Raquel Urtasun

In this paper, we tackle the problem of retrieving videos using complex natural language queries.

Autonomous Driving Natural Language Queries +2

Paper
Add Code

What are You Talking About? Text-to-Image Coreference

no code implementations • CVPR 2014 • Chen Kong, Dahua Lin, Mohit Bansal, Raquel Urtasun, Sanja Fidler

In this paper we exploit natural sentential descriptions of RGB-D scenes in order to improve 3D semantic parsing.

coreference-resolution Scene Classification +1

Paper
Add Code

Recognize Complex Events From Static Images by Fusing Deep Channels

no code implementations • CVPR 2015 • Yuanjun Xiong, Kai Zhu, Dahua Lin, Xiaoou Tang

A considerable portion of web images capture events that occur in our personal lives or social activities.

Paper
Add Code

WIDER Face and Pedestrian Challenge 2018: Methods and Results

no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou

This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.

Face Detection Pedestrian Detection +2

Paper
Add Code

Open Compound Domain Adaptation

no code implementations • CVPR 2020 • Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).

Domain Adaptation Facial Expression Recognition +2

Paper
Add Code

Biased Estimates of Advantages over Path Ensembles

no code implementations • 15 Sep 2019 • Lanxin Lei, Zhizhong Li, Dahua Lin

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths.

Atari Games Continuous Control +1

Paper
Add Code

A Graph-Based Framework to Bridge Movies and Synopses

no code implementations • ICCV 2019 • Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin

On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs.

Paper
Add Code

Learning to Synthesize Fashion Textures

no code implementations • 18 Nov 2019 • Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy

Another important observation is that fashion textures are multi-modal.

Paper
Add Code

Learning a Decision Module by Imitating Driver's Control Behaviors

no code implementations • 30 Nov 2019 • Junning Huang, Sirui Xie, Jiankai Sun, Qiurui Ma, Chunxiao Liu, Jianping Shi, Dahua Lin, Bolei Zhou

In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning.

Autonomous Driving Imitation Learning

Paper
Add Code

Regularizing Reasons for Outfit Evaluation with Gradient Penalty

no code implementations • 2 Feb 2020 • Xingxing Zou, Zhizhong Li, Ke Bai, Dahua Lin, Waikeung Wong

In this paper, we build an outfit evaluation system which provides feedbacks consisting of a judgment with a convincing explanation.

Sentence

Paper
Add Code

Learning Diverse Fashion Collocation by Neural Graph Filtering

no code implementations • 11 Mar 2020 • Xin Liu, Yongbin Sun, Ziwei Liu, Dahua Lin

To facilitate a comprehensive study on diverse fashion collocation, we reorganize Amazon Fashion dataset with carefully designed evaluation protocols.

Recommendation Systems

Paper
Add Code

Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds

no code implementations • 6 Apr 2020 • Tai Wang, Xinge Zhu, Dahua Lin

LiDAR is an important method for autonomous driving systems to sense the environment.

Autonomous Driving

Paper
Add Code

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

no code implementations • CVPR 2020 • Dian Shao, Yue Zhao, Bo Dai, Dahua Lin

To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnastic videos.

Action Recognition Action Understanding

Paper
Add Code

Intra- and Inter-Action Understanding via Temporal Action Parsing

no code implementations • CVPR 2020 • Dian Shao, Yue Zhao, Bo Dai, Dahua Lin

Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features.

Action Parsing Action Recognition +1

Paper
Add Code

Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations

no code implementations • ECCV 2020 • Huaiyi Huang, Yuqi Zhang, Qingqiu Huang, Zhengkui Guo, Ziwei Liu, Dahua Lin

Place is an important element in visual understanding.

Paper
Add Code

Learn to Propagate Reliably on Noisy Affinity Graphs

no code implementations • ECCV 2020 • Lei Yang, Qingqiu Huang, Huaiyi Huang, Linning Xu, Dahua Lin

Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost, which has been a critical issue in developing visual recognition models.

Open-Ended Question Answering

Paper
Add Code

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

no code implementations • ECCV 2020 • Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, Ziwei Liu

Stereophonic audio is an indispensable ingredient to enhance human auditory experience.

Audio Generation

Paper
Add Code

MovieNet: A Holistic Dataset for Movie Understanding

no code implementations • ECCV 2020 • Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin

We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.

Video Understanding

Paper
Add Code

A Unified Framework for Shot Type Classification Based on Subject Centric Lens

no code implementations • ECCV 2020 • Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, Dahua Lin

Shots are key narrative elements of various videos, e. g. movies, TV series, and user-generated videos that are thriving over the Internet.

General Classification Vocal Bursts Type Prediction

Paper
Add Code

Online Multi-modal Person Search in Videos

no code implementations • ECCV 2020 • Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.

Person Recognition Person Search

Paper
Add Code

Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation

no code implementations • ECCV 2020 • Qingqiu Huang, Lei Yang, Huaiyi Huang, Tong Wu, Dahua Lin

Captioned images are widely available on the web, while the captions often contain the names of the subjects in the images.

Face Model Face Recognition

Paper
Add Code

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-Based Point Clouds

no code implementations • 20 Nov 2020 • Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin

Recent years have witnessed the rapid progress of perception algorithms on top of LiDAR, a widely adopted sensor for autonomous driving systems.

Autonomous Driving

Paper
Add Code

CARAFE++: Unified Content-Aware ReAssembly of FEatures

no code implementations • 7 Dec 2020 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

Feature reassembly, i. e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e. g., residual networks and feature pyramids.

Image Inpainting Instance Segmentation +3

Paper
Add Code

Visually Informed Binaural Audio Generation without Binaural Audios

no code implementations • CVPR 2021 • Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin

Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

Audio Generation

Paper
Add Code

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection

no code implementations • 21 May 2021 • Shijie Fang, Yuhang Cao, Xinjiang Wang, Kai Chen, Dahua Lin, Wayne Zhang

The performance of object detection, to a great extent, depends on the availability of large annotated datasets.

object-detection Object Detection +2

Paper
Add Code

Scene-aware Generative Network for Human Motion Synthesis

no code implementations • CVPR 2021 • Jingbo Wang, Sijie Yan, Bo Dai, Dahua Lin

We revisit human motion synthesis, a task useful in various real world applications, in this paper.

Motion Synthesis

Paper
Add Code

Transcript to Video: Efficient Clip Sequencing from Texts

no code implementations • 25 Jul 2021 • Yu Xiong, Fabian Caba Heilbron, Dahua Lin

To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.

Retrieval

Paper
Add Code

Towards Balanced Learning for Instance Recognition

no code implementations • 23 Aug 2021 • Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin

Paper
Add Code

BlockPlanner: City Block Generation With Vectorized Graph Representation

no code implementations • ICCV 2021 • Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin

City modeling is the foundation for computational urban planning, navigation, and entertainment.

valid

Paper
Add Code

3D Building Reconstruction From Monocular Remote Sensing Images

no code implementations • ICCV 2021 • Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin

3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.

3D Reconstruction Model Optimization

Paper
Add Code

INTERN: A New Learning Paradigm Towards General Vision

no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Paper
Add Code

Learning with Social Influence through Interior Policy Differentiation

no code implementations • 25 Sep 2019 • Hao Sun, Bo Dai, Jiankai Sun, Zhenghao Peng, Guodong Xu, Dahua Lin, Bolei Zhou

In this work we model the social influence into the scheme of reinforcement learning, enabling the agents to learn both from the environment and from their peers.

Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.