no code implementations • ECCV 2020 • Qingqiu Huang, Lei Yang, Huaiyi Huang, Tong Wu, Dahua Lin
Captioned images are widely available on the web, while the captions often contain the names of the subjects in the images.
no code implementations • 29 Apr 2024 • Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He
In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture.
1 code implementation • 25 Apr 2024 • Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.
Ranked #6 on Visual Question Answering on MM-Vet
no code implementations • 25 Apr 2024 • Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin
Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions.
no code implementations • 16 Apr 2024 • Xiao Wang, Tianze Chen, Xianjun Yang, Qi Zhang, Xun Zhao, Dahua Lin
The open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
1 code implementation • 9 Apr 2024 • Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen
Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.
2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.
Ranked #12 on Visual Question Answering on MM-Vet
no code implementations • 1 Apr 2024 • Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
1 code implementation • 29 Mar 2024 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
We evaluate 16 leading LVLMs on MMStar to assess their multi-modal capabilities, and on 7 benchmarks with the proposed metrics to investigate their data leakage and actual multi-modal gain.
1 code implementation • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
Ranked #5 on Long-Context Understanding on Ada-LEval (BestAnswer)
1 code implementation • 20 Mar 2024 • Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.
1 code implementation • 19 Mar 2024 • Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.
no code implementations • 18 Mar 2024 • Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e. g., depth and normals, from single images.
1 code implementation • 13 Mar 2024 • Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.
1 code implementation • 12 Mar 2024 • Qinghao Hu, Zhisheng Ye, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
1 code implementation • 4 Mar 2024 • Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu
Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
1 code implementation • 27 Feb 2024 • Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang
We present SongComposer, an innovative LLM designed for song composition.
no code implementations • 26 Feb 2024 • Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, Dahua Lin
Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods.
no code implementations • 22 Feb 2024 • Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu
Data plays a fundamental role in the training of Large Language Models (LLMs).
1 code implementation • 22 Feb 2024 • Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang
We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.
1 code implementation • 21 Feb 2024 • Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin
The quality of training data are crucial for enhancing the long-text capabilities of foundation models.
1 code implementation • 21 Feb 2024 • Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao
Critique ability are crucial in the scalable oversight and self-improvement of Large Language Models (LLMs).
no code implementations • 20 Feb 2024 • Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Xipeng Qiu, Dahua Lin
Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness.
no code implementations • 20 Feb 2024 • Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin
The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs).
no code implementations • 19 Feb 2024 • Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang
However, mapping from a standard Gaussian by a flow-based model hurts the capacity to capture complicated patterns of trajectories, ignoring the under-represented motion intentions in the training data.
no code implementations • 17 Feb 2024 • Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu
To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.
no code implementations • 13 Feb 2024 • Ying Jin, Jiaqi Wang, Dahua Lin
We consider multi-source free domain adaptation, the problem of adapting multiple existing models to a new domain without accessing the source data.
1 code implementation • 9 Feb 2024 • Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin
We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.
1 code implementation • 7 Feb 2024 • Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, WangMeng Zuo, Dahua Lin, Yu Qiao, Jing Shao
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount.
1 code implementation • 6 Feb 2024 • Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang
In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.
no code implementations • 31 Jan 2024 • Chenyu Shi, Xiao Wang, Qiming Ge, Songyang Gao, Xianjun Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Xun Zhao, Dahua Lin
Large language models are meticulously aligned to be both helpful and harmless.
1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.
Ranked #17 on Visual Question Answering on MM-Vet
no code implementations • 26 Jan 2024 • Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu
Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.
1 code implementation • 26 Jan 2024 • Yu Sun, Keyu Chen, Shujie Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin
However, these evaluation benchmarks are limited to assessing the instruction-following capabilities, overlooking the fundamental abilities that emerge during the pre-training stage.
1 code implementation • 21 Jan 2024 • Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin
This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences.
no code implementations • 15 Jan 2024 • Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin
In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2, which seeks to find a better synergy between text detection and recognition.
1 code implementation • 8 Jan 2024 • Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein
These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences.
1 code implementation • 26 Dec 2023 • Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang
In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions.
1 code implementation • 22 Dec 2023 • Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao
We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.
1 code implementation • 21 Dec 2023 • Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu, Miao Zheng, Jingming Zhuo, Songyang Zhang, Dahua Lin, Kai Chen, Feng Zhao
Based on that, we further introduce T-Eval to evaluate the tool utilization capability step by step.
no code implementations • 13 Dec 2023 • Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee
We are witnessing significant breakthroughs in the technology for generating 3D objects from text.
no code implementations • 7 Dec 2023 • Tong Wu, Zhibing Li, Shuai Yang, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu
Extensive experiments demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.
1 code implementation • 6 Dec 2023 • Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue
In detail, we first train an image projection module to connect a vision encoder with LLM.
Ranked #79 on Visual Question Answering on MM-Vet
2 code implementations • 6 Dec 2023 • Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao
With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.
1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.
1 code implementation • 5 Dec 2023 • Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao
Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation.
no code implementations • 1 Dec 2023 • Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu
In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.
1 code implementation • 30 Nov 2023 • Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, LiMin Wang, Dahua Lin, Bo Dai
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications.
1 code implementation • 29 Nov 2023 • Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.
no code implementations • 29 Nov 2023 • Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai
In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired.
1 code implementation • 29 Nov 2023 • Shuangrui Ding, Rui Qian, Haohang Xu, Dahua Lin, Hongkai Xiong
In this paper, we propose a simple yet effective approach for self-supervised video object segmentation (VOS).
1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.
1 code implementation • 28 Nov 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai
The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.
no code implementations • 28 Nov 2023 • Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
1 code implementation • 27 Nov 2023 • Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia
This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.
1 code implementation • 27 Nov 2023 • Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai
Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM).
1 code implementation • 21 Nov 2023 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.
Ranked #1 on visual instruction following on LLaVA-Bench
1 code implementation • 12 Nov 2023 • Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin
The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values.
no code implementations • 5 Nov 2023 • Linning Xu, Vasu Agrawal, William Laney, Tony Garcia, Aayush Bansal, Changil Kim, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Aljaž Božič, Dahua Lin, Michael Zollhöfer, Christian Richardt
We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields.
no code implementations • 31 Oct 2023 • Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu
The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos.
1 code implementation • 20 Oct 2023 • Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen
In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.
no code implementations • 12 Oct 2023 • Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.
1 code implementation • 8 Oct 2023 • Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin
The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest.
no code implementations • 4 Oct 2023 • Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin
This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.
no code implementations • ICCV 2023 • Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, Bo Dai
While most of recent neural rendering works focus on objects and small-scale scenes, developing neural rendering methods for city-scale scenes is of great potential in many real-world applications.
2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.
Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)
1 code implementation • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval
1 code implementation • 14 Sep 2023 • Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang
Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution.
3 code implementations • 31 Aug 2023 • Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin
The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.
Ranked #2 on Generative 3D Object Classification on Objaverse
1 code implementation • 21 Aug 2023 • Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin
The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).
1 code implementation • ICCV 2023 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
In the second stage, for each semantics, we randomly sample slots from the corresponding Gaussian distribution and perform masked feature aggregation within the semantic area to exploit temporal correspondence patterns for instance identification.
1 code implementation • 9 Aug 2023 • Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, LiWei Wang
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue.
no code implementations • 4 Aug 2023 • Wangbo Zhao, Kepan Nan, Songyang Zhang, Kai Chen, Dahua Lin, Yang You
Based on this scheme, we develop a novel RVOS method that exploits weak annotations effectively.
1 code implementation • ICCV 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin
There has been significant progress in Masked Image Modeling (MIM).
1 code implementation • ICCV 2023 • Wei Cheng, Ruixiang Chen, Wanqi Yin, Siming Fan, Keyu Chen, Honglin He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, Kwan-Yee Lin
Realistic human-centric rendering plays a key role in both computer vision and computer graphics.
3 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
In response to these challenges, we propose MMBench, a novel multi-modality benchmark.
Ranked #1 on Visual Question Answering on MMBench
4 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.
no code implementations • 5 Jun 2023 • Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin
Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations.
2 code implementations • ICCV 2023 • Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li
Human driver can easily describe the complex traffic scene by visual system.
no code implementations • 25 May 2023 • Zikai Wei, Bo Dai, Dahua Lin
Active investing aims to construct a portfolio of assets that are believed to be relatively profitable in the markets, with one popular method being to construct a portfolio via factor-based strategies.
1 code implementation • NeurIPS 2023 • Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Kwan-Yee Lin
It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees.
2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
no code implementations • ICCV 2023 • Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
2) Hierarchical Category Organization: The vast vocabulary of V3Det is organized by a hierarchical category tree which annotates the inclusion relationship among categories, encouraging the exploration of category relationships in vast and open vocabulary object detection.
1 code implementation • ICCV 2023 • Zhitao Yang, Zhongang Cai, Haiyi Mei, Shuai Liu, Zhaoxi Chen, Weiye Xiao, Yukun Wei, Zhongfei Qing, Chen Wei, Bo Dai, Wayne Wu, Chen Qian, Dahua Lin, Ziwei Liu, Lei Yang
Synthetic data has emerged as a promising source for 3D human research as it offers low-cost access to large-scale human datasets.
1 code implementation • 29 Mar 2023 • Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang
Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation.
no code implementations • CVPR 2023 • Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin
An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions.
no code implementations • ICCV 2023 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin
Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice.
1 code implementation • CVPR 2023 • Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset.
1 code implementation • 23 Mar 2023 • Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang
DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception.
Ranked #1 on Panoptic Segmentation on SemanticKITTI
no code implementations • CVPR 2023 • Zhaoyang Lyu, Jinyi Wang, Yuwei An, Ya zhang, Dahua Lin, Bo Dai
In this work, we design a novel sparse latent point diffusion model for mesh generation.
1 code implementation • 4 Mar 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
no code implementations • 30 Jan 2023 • Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai
Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots.
1 code implementation • NIPS 2022 • Ying Jin, Jiaqi Wang, Dahua Lin
Semi-Supervised Semantic Segmentation aims at training the segmentation model with limited labeled data and a large amount of unlabeled data.
1 code implementation • CVPR 2023 • Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu
Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases.
3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.
Ranked #15 on Text Spotting on ICDAR 2015
no code implementations • ICCV 2023 • Jingbo Wang, Ye Yuan, Zhengyi Luo, Kevin Xie, Dahua Lin, Umar Iqbal, Sanja Fidler, Sameh Khamis
In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications.
no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
1 code implementation • CVPR 2023 • Ying Jin, Jiaqi Wang, Dahua Lin
Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously.
no code implementations • 5 Dec 2022 • Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu
Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.
no code implementations • 22 Oct 2022 • Zikai Wei, Bo Dai, Dahua Lin
Modeling and characterizing multiple factors is perhaps the most important step in achieving excess returns over market benchmarks.
no code implementations • 17 Oct 2022 • Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin
The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery.
no code implementations • 17 Oct 2022 • Zikai Wei, Xinge Zhu, Bo Dai, Dahua Lin
To accurately predict trajectories in multi-agent settings, e. g. team games, it is important to effectively model the interactions among agents.
3 code implementations • 12 Oct 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
Graph convolution networks (GCN) have been widely used in skeleton-based action recognition.
Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D
no code implementations • 3 Oct 2022 • Yunxiang Zhang, Benjamin Liang, Boyuan Chen, Paul Torrens, S. Farokh Atashzar, Dahua Lin, Qi Sun
Closing the gap between real-world physicality and immersive virtual experience requires a closed interaction loop: applying user-exerted physical forces to the virtual environment and generating haptic sensations back to the users.
1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin
Deep learning models have achieved excellent recognition results on large-scale video benchmarks.
2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
1 code implementation • 26 Aug 2022 • Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin
Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene.
no code implementations • CVPR 2023 • Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Gui-Song Xia, Dahua Lin
More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.
1 code implementation • 26 Jul 2022 • Tai Wang, Jiangmiao Pang, Dahua Lin
Perceiving 3D objects from monocular inputs is crucial for robotic systems, given its economy compared to multi-sensor settings.
1 code implementation • 26 Jul 2022 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
In this paper, we propose a novel learning scheme for self-supervised video representation learning.
2 code implementations • 30 May 2022 • Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, Hongfei Fu
In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks.
1 code implementation • 25 May 2022 • Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, Bo Dai
By modeling the reverse process of gradually diffusing the data distribution into a Gaussian distribution, generating a sample in DDPMs can be regarded as iteratively denoising a randomly sampled Gaussian noise.
no code implementations • CVPR 2022 • Jingbo Wang, Yu Rong, Jingyuan Liu, Sijie Yan, Dahua Lin, Bo Dai
The ability to synthesize long-term human motion sequences in real-world scenes can facilitate numerous applications.
1 code implementation • 19 May 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN.
Ranked #19 on Skeleton Based Action Recognition on NTU RGB+D 120
no code implementations • 6 May 2022 • Yuhang Cao, Jiaqi Wang, Yiqi Lin, Dahua Lin
The offline mining mechanism leverages a self-supervised discriminative model to collaboratively mine implicit novel instances with a trained FSOD network.
1 code implementation • CVPR 2022 • Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin
To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.
2 code implementations • CVPR 2022 • Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin
End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition.
Ranked #3 on Text Spotting on Inverse-Text
1 code implementation • CVPR 2022 • Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, LiMin Wang
Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step.
1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
Ranked #3 on Text Spotting on SCUT-CTW1500
no code implementations • 10 Dec 2021 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin
The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.
1 code implementation • ICLR 2022 • Zhaoyang Lyu, Zhifeng Kong, Xudong Xu, Liang Pan, Dahua Lin
The RFNet refines the coarse output of the CGNet and further improves quality of the completed point cloud.
1 code implementation • NeurIPS 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin
We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.
1 code implementation • 24 Nov 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin
We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.
1 code implementation • NeurIPS 2021 • Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin
1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
1 code implementation • NeurIPS 2021 • Xudong Xu, Xingang Pan, Dahua Lin, Bo Dai
In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence.
1 code implementation • 12 Sep 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, Dahua Lin
In this paper, we benchmark our model on these three tasks.
1 code implementation • 8 Sep 2021 • Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng
In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.
Ranked #1 on Video Instance Segmentation on YouTube-VIS
no code implementations • 23 Aug 2021 • Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin
In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.
1 code implementation • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin
We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.
1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin
As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.
1 code implementation • 29 Jul 2021 • Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty.
Ranked #10 on 3D Object Detection on KITTI Cars Hard val
no code implementations • 25 Jul 2021 • Yu Xiong, Fabian Caba Heilbron, Dahua Lin
To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.
no code implementations • CVPR 2021 • Jingbo Wang, Sijie Yan, Bo Dai, Dahua Lin
We revisit human motion synthesis, a task useful in various real world applications, in this paper.
no code implementations • 21 May 2021 • Shijie Fang, Yuhang Cao, Xinjiang Wang, Kai Chen, Dahua Lin, Wayne Zhang
The performance of object detection, to a great extent, depends on the availability of large annotated datasets.
4 code implementations • CVPR 2022 • Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons.
Ranked #1 on Action Recognition on NTU RGB+D
8 code implementations • 22 Apr 2021 • Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D.
Ranked #323 on 3D Object Detection on nuScenes
no code implementations • CVPR 2021 • Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin
Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.
1 code implementation • CVPR 2021 • Tong Wu, Ziwei Liu, Qingqiu Huang, Yu Wang, Dahua Lin
We then perform a systematic study on existing long-tailed recognition methods in conjunction with the adversarial training framework.
1 code implementation • CVPR 2021 • Zhaoyang Lyu, Minghao Guo, Tong Wu, Guodong Xu, Kehuan Zhang, Dahua Lin
Recent works have shown that interval bound propagation (IBP) can be used to train verifiably robust neural networks.
no code implementations • ICCV 2021 • Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin
3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.
no code implementations • ICCV 2021 • Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin
City modeling is the foundation for computational urban planning, navigation, and entertainment.
no code implementations • 7 Dec 2020 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin
Feature reassembly, i. e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e. g., residual networks and feature pyramids.
no code implementations • 20 Nov 2020 • Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin
Recent years have witnessed the rapid progress of perception algorithms on top of LiDAR, a widely adopted sensor for autonomous driving systems.
2 code implementations • CVPR 2021 • Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin
However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited.
Ranked #2 on 3D Semantic Segmentation on ScribbleKITTI
1 code implementation • 2 Sep 2020 • Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin
To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods.
2 code implementations • CVPR 2021 • Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin
Instances of head classes dominate a long-tailed dataset and they serve as negative samples of tail categories.
no code implementations • ECCV 2020 • Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin
The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.
no code implementations • ECCV 2020 • Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, Dahua Lin
Shots are key narrative elements of various videos, e. g. movies, TV series, and user-generated videos that are thriving over the Internet.
3 code implementations • 4 Aug 2020 • Hui Zhou, Xinge Zhu, Xiao Song, Yuexin Ma, Zhe Wang, Hongsheng Li, Dahua Lin
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
Ranked #11 on LIDAR Semantic Segmentation on nuScenes
no code implementations • ECCV 2020 • Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin
We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.
no code implementations • ECCV 2020 • Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, Ziwei Liu
Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
1 code implementation • ECCV 2020 • Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, Dahua Lin
We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions.
Ranked #7 on Long-tail Learning on VOC-MLT
no code implementations • ECCV 2020 • Lei Yang, Qingqiu Huang, Huaiyi Huang, Linning Xu, Dahua Lin
Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost, which has been a critical issue in developing visual recognition models.
no code implementations • ECCV 2020 • Huaiyi Huang, Yuqi Zhang, Qingqiu Huang, Zhengkui Guo, Ziwei Liu, Dahua Lin
Place is an important element in visual understanding.
1 code implementation • 21 May 2020 • Hao Sun, Zhenghao Peng, Bo Dai, Jian Guo, Dahua Lin, Bolei Zhou
In problem-solving, we humans can come up with multiple novel solutions to the same problem.
no code implementations • CVPR 2020 • Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features.
1 code implementation • ECCV 2020 • Jingbo Wang, Sijie Yan, Yuanjun Xiong, Dahua Lin
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
Ranked #19 on 3D Human Pose Estimation on Human3.6M
1 code implementation • 27 Apr 2020 • Hao Sun, Xinyu Pan, Bo Dai, Dahua Lin, Bolei Zhou
Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals.
no code implementations • CVPR 2020 • Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnastic videos.
1 code implementation • 7 Apr 2020 • Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer
Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.
no code implementations • 6 Apr 2020 • Tai Wang, Xinge Zhu, Dahua Lin
LiDAR is an important method for autonomous driving systems to sense the environment.
4 code implementations • CVPR 2020 • Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin
Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.
1 code implementation • 6 Apr 2020 • Xinge Zhu, Yuexin Ma, Tai Wang, Yan Xu, Jianping Shi, Dahua Lin
Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds.
2 code implementations • CVPR 2020 • Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, Chen Change Loy
This is achieved via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner.
3 code implementations • CVPR 2020 • Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin
With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.
1 code implementation • ECCV 2020 • Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, Ping Luo
Learning a good image prior is a long-term goal for image restoration and manipulation.
3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Ranked #5 on Action Recognition on UCF101 (using extra training data)
no code implementations • 11 Mar 2020 • Xin Liu, Yongbin Sun, Ziwei Liu, Dahua Lin
To facilitate a comprehensive study on diverse fashion collocation, we reorganize Amazon Fashion dataset with carefully designed evaluation protocols.
1 code implementation • CVPR 2020 • Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin
We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy.
Ranked #16 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)
2 code implementations • ICLR 2020 • Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, Dahua Lin
While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles.
no code implementations • 2 Feb 2020 • Xingxing Zou, Zhizhong Li, Ke Bai, Dahua Lin, Waikeung Wong
In this paper, we build an outfit evaluation system which provides feedbacks consisting of a judgment with a convincing explanation.
3 code implementations • ECCV 2020 • Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin
To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.
1 code implementation • 2 Dec 2019 • Zhaoyang Lyu, Ching-Yun Ko, Zhifeng Kong, Ngai Wong, Dahua Lin, Luca Daniel
We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints.
no code implementations • 30 Nov 2019 • Junning Huang, Sirui Xie, Jiankai Sun, Qiurui Ma, Chunxiao Liu, Jianping Shi, Dahua Lin, Bolei Zhou
In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning.
1 code implementation • CVPR 2020 • Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin
Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks.
no code implementations • 18 Nov 2019 • Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy
Another important observation is that fashion textures are multi-modal.
1 code implementation • NeurIPS 2019 • Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou
This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning.
no code implementations • ICCV 2019 • Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin
On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs.
no code implementations • 25 Sep 2019 • Hao Sun, Bo Dai, Jiankai Sun, Zhenghao Peng, Guodong Xu, Dahua Lin, Bolei Zhou
In this work we model the social influence into the scheme of reinforcement learning, enabling the agents to learn both from the environment and from their peers.
no code implementations • 25 Sep 2019 • Lanxin Lei, Zhizhong Li, Xiaoyang Li, Cong Qiu, Dahua Lin
The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths.
no code implementations • 15 Sep 2019 • Lanxin Lei, Zhizhong Li, Dahua Lin
The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths.
no code implementations • CVPR 2020 • Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong
A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).
1 code implementation • ICCV 2019 • Xudong Xu, Bo Dai, Dahua Lin
Sounds provide rich semantics, complementary to visual data, for many tasks.
144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
In this paper, we introduce the various features of this toolbox.
1 code implementation • ICCV 2019 • Chen Lin, Minghao Guo, Chuming Li, Yuan Xin, Wei Wu, Dahua Lin, Wanli Ouyang, Junjie Yan
Data augmentation is critical to the success of modern deep learning techniques.
2 code implementations • 17 May 2019 • Ching-Yun Ko, Zhaoyang Lyu, Tsui-Wei Weng, Luca Daniel, Ngai Wong, Dahua Lin
The vulnerability to adversarial attacks has been a critical issue for deep neural networks.
3 code implementations • ICCV 2019 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin
CARAFE introduces little computational overhead and can be readily integrated into modern network architectures.
Ranked #3 on Feature Upsampling on ImageNet
1 code implementation • CVPR 2020 • Yuhang Cao, Kai Chen, Chen Change Loy, Dahua Lin
Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector.
6 code implementations • CVPR 2019 • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin
In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.
Ranked #149 on Object Detection on COCO test-dev
3 code implementations • CVPR 2019 • Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin
Face recognition sees remarkable progress in recent years, and its performance has reached a very high level.
1 code implementation • CVPR 2019 • Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy
Instead of explicitly modeling the motion probabilities, we design the pretext task as a conditional motion propagation problem.
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
5 code implementations • CVPR 2019 • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.
Ranked #32 on Object Detection on COCO-O
2 code implementations • CVPR 2019 • Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin
State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios.
Ranked #1 on Region Proposal on COCO test-dev
no code implementations • 29 Dec 2018 • Jianqiao Wangni, Dahua Lin, Ji Liu, Kostas Daniilidis, Jianbo Shi
For recovering 3D object poses from 2D images, a prevalent method is to pre-train an over-complete dictionary $\mathcal D=\{B_i\}_i^D$ of 3D basis poses.
1 code implementation • CVPR 2019 • Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan
Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt).
1 code implementation • 5 Dec 2018 • Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang
Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.
no code implementations • NeurIPS 2018 • Yue Zhao, Yuanjun Xiong, Dahua Lin
How to leverage the temporal dimension is a key question in video analysis.
1 code implementation • NeurIPS 2018 • Bo Dai, Sanja Fidler, Dahua Lin
Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.