Search Results for author: Dahua Lin

Found 234 papers, 137 papers with code

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

1 code implementation4 Mar 2024 Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.

Text to 3D Texture Synthesis

Data-freeWeight Compress and Denoise for Large Language Models

no code implementations26 Feb 2024 Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, Dahua Lin

Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods.

Quantization

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

1 code implementation22 Feb 2024 Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

LongWanjuan: Towards Systematic Measurement for Long Text Quality

1 code implementation21 Feb 2024 Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

The quality of training data are crucial for enhancing the long-text capabilities of foundation models.

Language Modelling

CriticBench: Evaluating Large Language Models as Critic

1 code implementation21 Feb 2024 Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao

Critique ability are crucial in the scalable oversight and self-improvement of Large Language Models (LLMs).

Identifying Semantic Induction Heads to Understand In-Context Learning

no code implementations20 Feb 2024 Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Xipeng Qiu, Dahua Lin

Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness.

In-Context Learning Knowledge Graphs

Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

no code implementations20 Feb 2024 Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs).

Data Augmentation

Mixed Gaussian Flow for Diverse Trajectory Prediction

no code implementations19 Feb 2024 Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang

However, mapping from a standard Gaussian by a flow-based model hurts the capacity to capture complicated patterns of trajectories, ignoring the under-represented motion intentions in the training data.

Trajectory Prediction

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

no code implementations17 Feb 2024 Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.

Computational Efficiency

SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization

no code implementations13 Feb 2024 Ying Jin, Jiaqi Wang, Dahua Lin

We consider multi-source free domain adaptation, the problem of adapting multiple existing models to a new domain without accessing the source data.

Source-Free Domain Adaptation

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

1 code implementation9 Feb 2024 Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.

Data Augmentation GSM8K +3

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

1 code implementation7 Feb 2024 Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, WangMeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount.

Multiple-choice

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

1 code implementation6 Feb 2024 Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.

F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods

1 code implementation26 Jan 2024 Yu Sun, Keyu Chen, Shujie Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

However, these evaluation benchmarks are limited to assessing the instruction-following capabilities, overlooking the fundamental abilities that emerge during the pre-training stage.

Instruction Following

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

no code implementations26 Jan 2024 Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.

Language Modelling Large Language Model

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

1 code implementation21 Jan 2024 Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences.

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

no code implementations15 Jan 2024 Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin

In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2, which seeks to find a better synergy between text detection and recognition.

Text Detection Text Spotting

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

1 code implementation8 Jan 2024 Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences.

Text to 3D

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

1 code implementation26 Dec 2023 Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions.

Scene Understanding

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

1 code implementation22 Dec 2023 Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

no code implementations7 Dec 2023 Tong Wu, Zhibing Li, Shuai Yang, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu

Extensive experiments demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.

Semantic Segmentation

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

2 code implementations6 Dec 2023 Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Kai Yan, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.

Autonomous Driving

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation6 Dec 2023 Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

1 code implementation5 Dec 2023 Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation.

Image Generation Reading Comprehension

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations1 Dec 2023 Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

1 code implementation30 Nov 2023 Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, LiMin Wang, Dahua Lin, Bo Dai

Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications.

Neural Rendering

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation29 Nov 2023 Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

no code implementations29 Nov 2023 Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired.

Pose Estimation

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

1 code implementation29 Nov 2023 Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.

Hallucination

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

1 code implementation28 Nov 2023 Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai

The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.

Video Generation

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

no code implementations28 Nov 2023 Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu

In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.

SpotServe: Serving Generative Large Language Models on Preemptible Instances

1 code implementation27 Nov 2023 Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.

Graph Matching

InterControl: Generate Human Motion Interactions by Controlling Every Joint

1 code implementation27 Nov 2023 Zhenzhi Wang, Jingbo Wang, Dahua Lin, Bo Dai

We also collect data of joint contact pairs by LLMs to show InterControl's ability in human interaction generation.

Language Modelling Large Language Model

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

1 code implementation21 Nov 2023 Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Descriptive visual instruction following +2

Flames: Benchmarking Value Alignment of Chinese Large Language Models

no code implementations12 Nov 2023 Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin

To efficiently evaluate new models on the benchmark, we develop a specified scorer capable of scoring LLMs across multiple dimensions, achieving an accuracy of 77. 4%.

Benchmarking Fairness

VR-NeRF: High-Fidelity Virtualized Walkable Spaces

no code implementations5 Nov 2023 Linning Xu, Vasu Agrawal, William Laney, Tony Garcia, Aayush Bansal, Changil Kim, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Aljaž Božič, Dahua Lin, Michael Zollhöfer, Christian Richardt

We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields.

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

1 code implementation20 Oct 2023 Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen

In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.

Instruction Following

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

no code implementations12 Oct 2023 Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.

Image Generation

Scaling Laws of RoPE-based Extrapolation

1 code implementation8 Oct 2023 Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin

The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest.

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

no code implementations4 Oct 2023 Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin

This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.

MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond

no code implementations ICCV 2023 Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, Bo Dai

While most of recent neural rendering works focus on objects and small-scale scenes, developing neural rendering methods for city-scale scenes is of great potential in many real-world applications.

Neural Rendering

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations26 Sep 2023 Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Text-to-Video Generation Video Generation +1

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

1 code implementation14 Sep 2023 Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang

Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution.

Language Modelling Large Language Model

PointLLM: Empowering Large Language Models to Understand Point Clouds

3 code implementations31 Aug 2023 Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

3D Object Classification 3D Question Answering (3D-QA) +2

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

1 code implementation21 Aug 2023 Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin

The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

1 code implementation ICCV 2023 Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

In the second stage, for each semantics, we randomly sample slots from the corresponding Gaussian distribution and perform masked feature aggregation within the semantic area to exploit temporal correspondence patterns for instance identification.

Object Object Discovery +1

CLEVA: Chinese Language Models EVAluation Platform

1 code implementation9 Aug 2023 Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, LiWei Wang

With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

4 code implementations10 Jul 2023 Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai

Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.

Image Animation

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

no code implementations5 Jun 2023 Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations.

Open-Ended Question Answering Stock Prediction

E2EAI: End-to-End Deep Learning Framework for Active Investing

no code implementations25 May 2023 Zikai Wei, Bo Dai, Dahua Lin

Active investing aims to construct a portfolio of assets that are believed to be relatively profitable in the markets, with one popular method being to construct a portfolio via factor-based strategies.

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

V3Det: Vast Vocabulary Visual Detection Dataset

no code implementations ICCV 2023 Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin

2) Hierarchical Category Organization: The vast vocabulary of V3Det is organized by a hierarchical category tree which annotates the inclusion relationship among categories, encouraging the exploration of category relationships in vast and open vocabulary object detection.

Chatbot Object +2

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking

1 code implementation29 Mar 2023 Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang

Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation.

3D Object Detection Depth Estimation +3

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation

no code implementations ICCV 2023 Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin

Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice.

Novel View Synthesis Object

Grid-guided Neural Radiance Fields for Large Urban Scenes

no code implementations CVPR 2023 Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin

An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

1 code implementation CVPR 2023 Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset.

3D Object Detection object-detection

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

1 code implementation4 Mar 2023 YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.

Self-Supervised Learning

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

no code implementations30 Jan 2023 Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai

Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots.

Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant

1 code implementation NIPS 2022 Ying Jin, Jiaqi Wang, Dahua Lin

Semi-Supervised Semantic Segmentation aims at training the segmentation model with limited labeled data and a large amount of unlabeled data.

Segmentation Semi-Supervised Semantic Segmentation

SPTS v2: Single-Point Scene Text Spotting

3 code implementations4 Jan 2023 Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Text Detection Text Spotting

Learning Human Dynamics in Autonomous Driving Scenarios

no code implementations ICCV 2023 Jingbo Wang, Ye Yuan, Zhengyi Luo, Kevin Xie, Dahua Lin, Umar Iqbal, Sanja Fidler, Sameh Khamis

In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications.

Autonomous Driving Human Dynamics

Multi-Level Logit Distillation

1 code implementation CVPR 2023 Ying Jin, Jiaqi Wang, Dahua Lin

Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously.

Knowledge Distillation

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Audio-Driven Co-Speech Gesture Video Generation

no code implementations5 Dec 2022 Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu

Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.

Video Generation

Factor Investing with a Deep Multi-Factor Model

no code implementations22 Oct 2022 Zikai Wei, Bo Dai, Dahua Lin

Modeling and characterizing multiple factors is perhaps the most important step in achieving excess returns over market benchmarks.

Graph Attention Management +1

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

no code implementations17 Oct 2022 Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery.

Rethinking Trajectory Prediction via "Team Game"

no code implementations17 Oct 2022 Zikai Wei, Xinge Zhu, Bo Dai, Dahua Lin

To accurately predict trajectories in multi-agent settings, e. g. team games, it is important to effectively model the interactions among agents.

Trajectory Prediction

Force-Aware Interface via Electromyography for Natural VR/AR Interaction

no code implementations3 Oct 2022 Yunxiang Zhang, Benjamin Liang, Boyuan Chen, Paul Torrens, S. Farokh Atashzar, Dahua Lin, Qi Sun

Closing the gap between real-world physicality and immersive virtual experience requires a closed interaction loop: applying user-exerted physical forces to the virtual environment and generating haptic sensations back to the users.

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

1 code implementation20 Sep 2022 Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin

Deep learning models have achieved excellent recognition results on large-scale video benchmarks.

Action Recognition

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations12 Sep 2022 Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

1 code implementation26 Aug 2022 Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin

Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene.

Surface Reconstruction

OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images

no code implementations CVPR 2023 Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Gui-Song Xia, Dahua Lin

More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.

Instance Segmentation Segmentation +1

Monocular 3D Object Detection with Depth from Motion

1 code implementation26 Jul 2022 Tai Wang, Jiangmiao Pang, Dahua Lin

Perceiving 3D objects from monocular inputs is crucial for robotic systems, given its economy compared to multi-sensor settings.

Depth Estimation Monocular 3D Object Detection +2

Guided Diffusion Model for Adversarial Purification

2 code implementations30 May 2022 Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, Hongfei Fu

In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks.

Denoising

Accelerating Diffusion Models via Early Stop of the Diffusion Process

1 code implementation25 May 2022 Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, Bo Dai

By modeling the reverse process of gradually diffusing the data distribution into a Gaussian distribution, generating a sample in DDPMs can be regarded as iteratively denoising a randomly sampled Gaussian noise.

Denoising Image Generation

Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis

no code implementations CVPR 2022 Jingbo Wang, Yu Rong, Jingyuan Liu, Sijie Yan, Dahua Lin, Bo Dai

The ability to synthesize long-term human motion sequences in real-world scenes can facilitate numerous applications.

Motion Synthesis

PYSKL: Towards Good Practices for Skeleton Action Recognition

1 code implementation19 May 2022 Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN.

Action Recognition Skeleton Based Action Recognition

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection

no code implementations6 May 2022 Yuhang Cao, Jiaqi Wang, Yiqi Lin, Dahua Lin

The offline mining mechanism leverages a self-supervised discriminative model to collaboratively mine implicit novel instances with a trained FSOD network.

Few-Shot Object Detection object-detection

OCSampler: Compressing Videos to One Clip with Single-step Sampling

1 code implementation CVPR 2022 Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, LiMin Wang

Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step.

Video Recognition

SPTS: Single-Point Text Spotting

1 code implementation15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Detection +1

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering

no code implementations10 Dec 2021 Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin

The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation NeurIPS 2021 Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation24 Nov 2021 Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

Few-Shot Object Detection via Association and DIscrimination

1 code implementation NeurIPS 2021 Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

INTERN: A New Learning Paradigm Towards General Vision

no code implementations16 Nov 2021 Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis

1 code implementation NeurIPS 2021 Xudong Xu, Xingang Pan, Dahua Lin, Bo Dai

In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence.

3D-Aware Image Synthesis Object

Temporal RoI Align for Video Object Recognition

1 code implementation8 Sep 2021 Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.

Instance Segmentation Object +5

Towards Balanced Learning for Instance Recognition

no code implementations23 Aug 2021 Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

1 code implementation14 Aug 2021 Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

Vision Transformer with Progressive Sampling

1 code implementation ICCV 2021 Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

Probabilistic and Geometric Depth: Detecting Objects in Perspective

1 code implementation29 Jul 2021 Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty.

Attribute Depth Estimation +2

Transcript to Video: Efficient Clip Sequencing from Texts

no code implementations25 Jul 2021 Yu Xiong, Fabian Caba Heilbron, Dahua Lin

To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.

Retrieval

Scene-aware Generative Network for Human Motion Synthesis

no code implementations CVPR 2021 Jingbo Wang, Sijie Yan, Bo Dai, Dahua Lin

We revisit human motion synthesis, a task useful in various real world applications, in this paper.

Motion Synthesis

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection

no code implementations21 May 2021 Shijie Fang, Yuhang Cao, Xinjiang Wang, Kai Chen, Dahua Lin, Wayne Zhang

The performance of object detection, to a great extent, depends on the availability of large annotated datasets.

object-detection Object Detection +2

Revisiting Skeleton-based Action Recognition

4 code implementations CVPR 2022 Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons.

Action Recognition Group Activity Recognition +2

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

8 code implementations22 Apr 2021 Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D.

Autonomous Driving Monocular 3D Object Detection +2

Visually Informed Binaural Audio Generation without Binaural Audios

no code implementations CVPR 2021 Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin

Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

Audio Generation

Adversarial Robustness under Long-Tailed Distribution

1 code implementation CVPR 2021 Tong Wu, Ziwei Liu, Qingqiu Huang, Yu Wang, Dahua Lin

We then perform a systematic study on existing long-tailed recognition methods in conjunction with the adversarial training framework.

Adversarial Robustness

Towards Evaluating and Training Verifiably Robust Neural Networks

1 code implementation CVPR 2021 Zhaoyang Lyu, Minghao Guo, Tong Wu, Guodong Xu, Kehuan Zhang, Dahua Lin

Recent works have shown that interval bound propagation (IBP) can be used to train verifiably robust neural networks.

3D Building Reconstruction From Monocular Remote Sensing Images

no code implementations ICCV 2021 Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin

3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.

3D Reconstruction Model Optimization

CARAFE++: Unified Content-Aware ReAssembly of FEatures

no code implementations7 Dec 2020 Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

Feature reassembly, i. e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e. g., residual networks and feature pyramids.

Image Inpainting Instance Segmentation +3

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-Based Point Clouds

no code implementations20 Nov 2020 Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin

Recent years have witnessed the rapid progress of perception algorithms on top of LiDAR, a widely adopted sensor for autonomous driving systems.

Autonomous Driving

Understanding the wiring evolution in differentiable neural architecture search

1 code implementation2 Sep 2020 Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods.

Neural Architecture Search

Online Multi-modal Person Search in Videos

no code implementations ECCV 2020 Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.

Person Recognition Person Search

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

3 code implementations4 Aug 2020 Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin

A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.

3D Semantic Segmentation LIDAR Semantic Segmentation

MovieNet: A Holistic Dataset for Movie Understanding

no code implementations ECCV 2020 Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin

We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.

Video Understanding

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

1 code implementation ECCV 2020 Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, Dahua Lin

We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions.

Binary Classification General Classification +2

Learn to Propagate Reliably on Noisy Affinity Graphs

no code implementations ECCV 2020 Lei Yang, Qingqiu Huang, Huaiyi Huang, Linning Xu, Dahua Lin

Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost, which has been a critical issue in developing visual recognition models.

Open-Ended Question Answering

Intra- and Inter-Action Understanding via Temporal Action Parsing

no code implementations CVPR 2020 Dian Shao, Yue Zhao, Bo Dai, Dahua Lin

Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features.

Action Parsing Action Recognition +1

Evolutionary Stochastic Policy Distillation

1 code implementation27 Apr 2020 Hao Sun, Xinyu Pan, Bo Dai, Dahua Lin, Bolei Zhou

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals.

Feature Pyramid Grids

1 code implementation7 Apr 2020 Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.

Neural Architecture Search object-detection +2

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

4 code implementations CVPR 2020 Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.

Action Recognition Scene Segmentation +1

Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds

no code implementations6 Apr 2020 Tai Wang, Xinge Zhu, Dahua Lin

LiDAR is an important method for autonomous driving systems to sense the environment.

Autonomous Driving

Self-Supervised Scene De-occlusion

2 code implementations CVPR 2020 Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, Chen Change Loy

This is achieved via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner.

Image Manipulation Scene Understanding

SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds

1 code implementation6 Apr 2020 Xinge Zhu, Yuexin Ma, Tai Wang, Yan Xu, Jianping Shi, Dahua Lin

Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds.

3D Object Detection object-detection

Learning to Cluster Faces via Confidence and Connectivity Estimation

3 code implementations CVPR 2020 Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin

With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.

Clustering Connectivity Estimation +2

Omni-sourced Webly-supervised Learning for Video Recognition

3 code implementations ECCV 2020 Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.

Ranked #5 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Recognition +1

Learning Diverse Fashion Collocation by Neural Graph Filtering

no code implementations11 Mar 2020 Xin Liu, Yongbin Sun, Ziwei Liu, Dahua Lin

To facilitate a comprehensive study on diverse fashion collocation, we reorganize Amazon Fashion dataset with carefully designed evaluation protocols.

Recommendation Systems

DSNAS: Direct Neural Architecture Search without Parameter Retraining

1 code implementation CVPR 2020 Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy.

Neural Architecture Search

Real or Not Real, that is the Question

2 code implementations ICLR 2020 Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, Dahua Lin

While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles.

Regularizing Reasons for Outfit Evaluation with Gradient Penalty

no code implementations2 Feb 2020 Xingxing Zou, Zhizhong Li, Ke Bai, Dahua Lin, Waikeung Wong

In this paper, we build an outfit evaluation system which provides feedbacks consisting of a judgment with a convincing explanation.

Sentence

Side-Aware Boundary Localization for More Precise Object Detection

3 code implementations ECCV 2020 Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.

Object object-detection +2

Fastened CROWN: Tightened Neural Network Robustness Certificates

1 code implementation2 Dec 2019 Zhaoyang Lyu, Ching-Yun Ko, Zhifeng Kong, Ngai Wong, Dahua Lin, Luca Daniel

We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints.

Learning a Decision Module by Imitating Driver's Control Behaviors

no code implementations30 Nov 2019 Junning Huang, Sirui Xie, Jiankai Sun, Qiurui Ma, Chunxiao Liu, Jianping Shi, Dahua Lin, Bolei Zhou

In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning.

Autonomous Driving Imitation Learning

Learning to Synthesize Fashion Textures

no code implementations18 Nov 2019 Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy

Another important observation is that fashion textures are multi-modal.

Policy Continuation with Hindsight Inverse Dynamics

1 code implementation NeurIPS 2019 Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou

This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning.

Reinforcement Learning (RL)

A Graph-Based Framework to Bridge Movies and Synopses

no code implementations ICCV 2019 Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin

On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs.

Regulatory Focus: Promotion and Prevention Inclinations in Policy Search

no code implementations25 Sep 2019 Lanxin Lei, Zhizhong Li, Xiaoyang Li, Cong Qiu, Dahua Lin

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths.

Atari Games Continuous Control +1

Learning with Social Influence through Interior Policy Differentiation

no code implementations25 Sep 2019 Hao Sun, Bo Dai, Jiankai Sun, Zhenghao Peng, Guodong Xu, Dahua Lin, Bolei Zhou

In this work we model the social influence into the scheme of reinforcement learning, enabling the agents to learn both from the environment and from their peers.

Reinforcement Learning (RL)

Biased Estimates of Advantages over Path Ensembles

no code implementations15 Sep 2019 Lanxin Lei, Zhizhong Li, Dahua Lin

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths.

Atari Games Continuous Control +1

Open Compound Domain Adaptation

no code implementations CVPR 2020 Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).

Domain Adaptation Facial Expression Recognition +2

Recursive Visual Sound Separation Using Minus-Plus Net

1 code implementation ICCV 2019 Xudong Xu, Bo Dai, Dahua Lin

Sounds provide rich semantics, complementary to visual data, for many tasks.

POPQORN: Quantifying Robustness of Recurrent Neural Networks

2 code implementations17 May 2019 Ching-Yun Ko, Zhaoyang Lyu, Tsui-Wei Weng, Luca Daniel, Ngai Wong, Dahua Lin

The vulnerability to adversarial attacks has been a critical issue for deep neural networks.

CARAFE: Content-Aware ReAssembly of FEatures

3 code implementations ICCV 2019 Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

CARAFE introduces little computational overhead and can be readily integrated into modern network architectures.

Feature Upsampling Instance Segmentation +3

Prime Sample Attention in Object Detection

1 code implementation CVPR 2020 Yuhang Cao, Kai Chen, Chen Change Loy, Dahua Lin

Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector.

Object object-detection +1

Learning to Cluster Faces on an Affinity Graph

3 code implementations CVPR 2019 Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin

Face recognition sees remarkable progress in recent years, and its performance has reached a very high level.

Clustering Face Recognition +1

Libra R-CNN: Towards Balanced Learning for Object Detection

6 code implementations CVPR 2019 Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

object-detection Object Detection

Self-Supervised Learning via Conditional Motion Propagation

1 code implementation CVPR 2019 Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy

Instead of explicitly modeling the motion probabilities, we design the pretext task as a conditional motion propagation problem.

Human Parsing Instance Segmentation +2

Hybrid Task Cascade for Instance Segmentation

5 code implementations CVPR 2019 Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Instance Segmentation object-detection +4

Region Proposal by Guided Anchoring

1 code implementation CVPR 2019 Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin

State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios.

object-detection Object Detection +1

Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis

no code implementations29 Dec 2018 Jianqiao Wangni, Dahua Lin, Ji Liu, Kostas Daniilidis, Jianbo Shi

For recovering 3D object poses from 2D images, a prevalent method is to pre-train an over-complete dictionary $\mathcal D=\{B_i\}_i^D$ of 3D basis poses.

IRLAS: Inverse Reinforcement Learning for Architecture Search

1 code implementation CVPR 2019 Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan

Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt).

Neural Architecture Search reinforcement-learning +1

An Embarrassingly Simple Approach for Knowledge Distillation

1 code implementation5 Dec 2018 Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.

Face Recognition Knowledge Distillation +3

A Neural Compositional Paradigm for Image Captioning

1 code implementation NeurIPS 2018 Bo Dai, Sanja Fidler, Dahua Lin

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.

Image Captioning

Improving On-policy Learning with Statistical Reward Accumulation

no code implementations7 Sep 2018 Yubin Deng, Ke Yu, Dahua Lin, Xiaoou Tang, Chen Change Loy

Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns.

Atari Games

Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition

4 code implementations ECCV 2018 Xiaohang Zhan, Ziwei Liu, Junjie Yan, Dahua Lin, Chen Change Loy

Face recognition has witnessed great progress in recent years, mainly attributed to the high-capacity model designed and the abundant labeled data collected.

Face Recognition

Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation

no code implementations ECCV 2018 Xinge Zhu, Hui Zhou, Ceyuan Yang, Jianping Shi, Dahua Lin

Due to the expensive and time-consuming annotations (e. g., segmentation) for real-world images, recent works in computer vision resort to synthetic data.

Domain Adaptation Segmentation +1

PSANet: Point-wise Spatial Attention Network for Scene Parsing

4 code implementations ECCV 2018 Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia

We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes.

Position Scene Parsing +1

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

no code implementations ECCV 2018 Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin

The thriving of video sharing services brings new challenges to video retrieval, e. g. the rapid growth in video duration and content diversity.

Natural Language Queries Retrieval +2

Generative Adversarial Frontal View to Bird View Synthesis

no code implementations1 Aug 2018 Xinge Zhu, Zhichao Yin, Jianping Shi, Hongsheng Li, Dahua Lin

Due to the large gap and severe deformation between the frontal view and bird view, generating a bird view image from a single frontal view is challenging.

Bird View Synthesis Homography Estimation +1

Pose Guided Human Video Generation

no code implementations ECCV 2018 Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, Dahua Lin

Human pose, on the other hand, can represent motion patterns intrinsically and interpretably, and impose the geometric constraints regardless of appearance.

Generative Adversarial Network motion prediction +1

Person Search in Videos with One Portrait Through Visual and Temporal Links

2 code implementations ECCV 2018 Qingqiu Huang, Wentao Liu, Dahua Lin

In real-world applications, e. g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait.

Person Re-Identification Person Search +2

Rethinking the Form of Latent States in Image Captioning

no code implementations ECCV 2018 Bo Dai, Deming Ye, Dahua Lin

Taking advantage of this, we visually reveal the internal dynamics in the process of caption generation, as well as the connections between input visual domain and output linguistic domain.

Image Captioning

Probabilistic Ensemble of Collaborative Filters

no code implementations26 Jun 2018 Zhiyu Min, Dahua Lin

Collaborative filtering is an important technique for recommendation.

Collaborative Filtering

Cannot find the paper you are looking for? You can Submit a new open access paper.