1 code implementation • 29 Nov 2024 • Kaican Li, Weiyan Xie, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang
Fine-tuning foundation models often compromises their robustness to distribution shifts.
no code implementations • 26 Nov 2024 • Minbin Huang, Runhui Huang, Han Shi, Yimeng Chen, Chuanyang Zheng, Xiangguo Sun, Xin Jiang, Zhenguo Li, Hong Cheng
The development of Multi-modal Large Language Models (MLLMs) enhances Large Language Models (LLMs) with the ability to perceive data formats beyond text, significantly advancing a range of downstream applications, such as visual question answering and image captioning.
no code implementations • 21 Nov 2024 • Ruiyuan Gao, Kai Chen, Bo Xiao, Lanqing Hong, Zhenguo Li, Qiang Xu
The rapid advancement of diffusion models has greatly improved video synthesis, especially in controllable video generation, which is essential for applications like autonomous driving.
no code implementations • 22 Oct 2024 • Qintong Li, Jiahui Gao, Sheng Wang, Renjie Pi, Xueliang Zhao, Chuan Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong
In this paper, we present a novel approach, ReverseGen, designed to automatically generate effective training samples that expose the weaknesses of LLMs.
1 code implementation • 18 Oct 2024 • Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong
Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.
no code implementations • 17 Oct 2024 • Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, LiWei Wang
Despite the remarkable success of Transformer-based Large Language Models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge.
2 code implementations • 7 Oct 2024 • Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li
The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks.
1 code implementation • 2 Oct 2024 • Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
no code implementations • 26 Sep 2024 • Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-yan Yeung, Xiao Chen, Zhenguo Li, Wei zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models.
no code implementations • 17 Sep 2024 • Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing Hong, Lingpeng Kong, Xin Jiang, Zhenguo Li
The deployment of multimodal large language models (MLLMs) has demonstrated remarkable success in engaging in conversations involving visual inputs, thanks to the superior power of large language models (LLMs).
1 code implementation • 19 Jul 2024 • Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu
In this work, we conduct the first systematic study on compositional text-to-video generation.
no code implementations • 8 Jul 2024 • Zhenyu Wang, Aoxue Li, Zhenguo Li, Xihui Liu
For a complex problem, the MLLM agent decomposes it into simpler sub-problems and constructs a tree structure to systematically plan the procedure of generation, editing, and self-correction with step-by-step verification.
1 code implementation • 20 Jun 2024 • Zhihui Xie, Jiahui Gao, Lei LI, Zhenguo Li, Qi Liu, Lingpeng Kong
In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process.
1 code implementation • 11 Jun 2024 • Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia
On widely recognized benchmarks, Q-LLM improved by 7. 17% compared to the current state-of-the-art on LLaMA3, and by 3. 26% on Mistral on the $\infty$-bench.
1 code implementation • 6 Jun 2024 • Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li
In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form.
no code implementations • 24 May 2024 • Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu
Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving.
no code implementations • 24 May 2024 • Mingyang Yi, Aoxue Li, Yi Xin, Zhenguo Li
We conclude that in the earlier generation stage, the image is mostly decided by the special token [\texttt{EOS}] in the text prompt, and the information in the text prompt is already conveyed in this stage.
no code implementations • 24 May 2024 • Aoxue Li, Mingyang Yi, Zhenguo Li
Then following a fusion process that carefully integrates the source intermediate (hidden) states (obtained by inversion) with the ones of the target image.
1 code implementation • 23 May 2024 • Haiming Wang, Huajian Xin, Zhengying Liu, Wenda Li, Yinya Huang, Jianqiao Lu, Zhicheng Yang, Jing Tang, Jian Yin, Zhenguo Li, Xiaodan Liang
This approach allows the theorem to be tackled incrementally by outlining the overall theorem at the first level and then solving the intermediate conjectures at deeper levels.
no code implementations • 23 May 2024 • Ruiyuan Gao, Kai Chen, Zhihao LI, Lanqing Hong, Zhenguo Li, Qiang Xu
While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs.
2 code implementations • 23 May 2024 • Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li
Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization.
1 code implementation • 23 May 2024 • Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate "weak-to-strong" training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$).
no code implementations • 5 May 2024 • Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu, Zhenguo Li, Xiaodan Liang
We conduct extensive experiments to investigate whether current LMs can generate theorems in the library and benefit the problem theorems proving.
no code implementations • 1 May 2024 • Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok
As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge.
no code implementations • 16 Apr 2024 • Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-yan Yeung, Huchuan Lu, Xu Jia
Moreover, with our CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all open-sourced counterparts on CODA-LM.
no code implementations • CVPR 2024 • Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei zhang, Zhenguo Li, Dan Xu
This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.
Ranked #2 on Object Detection on ODinW Full-Shot 13 Tasks
no code implementations • 25 Mar 2024 • Tianqi Wang, Enze Xie, Ruihang Chu, Zhenguo Li, Ping Luo
We utilize the challenging driving scenarios from the CARLA leaderboard 2. 0, which involve high-speed driving and lane-changing, and propose a rule-based expert policy to control the vehicle and generate ground truth labels for its reasoning process across different driving aspects and the final decisions.
no code implementations • CVPR 2024 • Yingji Zhong, Lanqing Hong, Zhenguo Li, Dan Xu
While existing works mainly consider ray-level consistency to construct 2D learning regularization based on rendered color, depth, or semantics on image planes, in this paper we propose a novel approach that models 3D spatial field consistency to improve NeRF's performance with sparse inputs.
no code implementations • CVPR 2024 • Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-yan Yeung, Qiang Xu, Kai Zhang
Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance.
1 code implementation • 20 Mar 2024 • Tianwei Xiong, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu
We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics.
no code implementations • 14 Mar 2024 • Zhao Wang, Aoxue Li, Zhenguo Li, Qi Dou
Given this zoo, we adopt 7 target datasets from 5 diverse domains as the downstream target tasks for evaluation.
no code implementations • 14 Mar 2024 • Zhao Wang, Aoxue Li, Fengwei Zhou, Zhenguo Li, Qi Dou
Without using knowledge distillation, ensemble model or extra training data during detector training, our proposed MIC outperforms previous SOTA methods trained with these complex techniques on LVIS.
no code implementations • 14 Mar 2024 • Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang
To construct robust MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in MLLMs.
2 code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li
In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.
1 code implementation • CVPR 2024 • Shuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li
While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps.
no code implementations • 23 Feb 2024 • Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi
Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1. 58) with only 39 NFEs (1. 57).
no code implementations • 21 Feb 2024 • Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu
Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms.
1 code implementation • 14 Feb 2024 • Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang
Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving.
1 code implementation • 12 Feb 2024 • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.
no code implementations • 8 Feb 2024 • Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok
It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.
no code implementations • 28 Jan 2024 • Zhenyu Wang, Enze Xie, Aoxue Li, Zhongdao Wang, Xihui Liu, Zhenguo Li
Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout.
no code implementations • 18 Jan 2024 • Zhao Wang, Aoxue Li, Lingting Zhu, Yong Guo, Qi Dou, Zhenguo Li
Customized text-to-video generation aims to generate high-quality videos guided by text prompts and subject references.
1 code implementation • 10 Jan 2024 • Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li
As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.
no code implementations • CVPR 2024 • Feng Xue, Zi He, Yuan Zhang, Chuanlong Xie, Zhenguo Li, Falong Tan
In this work we present a novel perspective on detecting out-of-distribution (OOD) samples and propose an algorithm for sample-aware model selection to enhance the effectiveness of OOD detection.
no code implementations • 26 Dec 2023 • Kaichen Zhou, Lanqing Hong, Enze Xie, Yongxin Yang, Zhenguo Li, Wei zhang
Although significant progress has been made in the field of 2D-based interactive editing, fine-grained 3D-based interactive editing remains relatively unexplored.
1 code implementation • 18 Dec 2023 • Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, YuFei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong
We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships.
1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.
no code implementations • 12 Dec 2023 • Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li
Motivated by the inherent redundancy of 3D compared to 2D, we propose FastDiT-3D, a novel masked diffusion transformer tailored for efficient 3D point cloud generation, which greatly reduces training costs.
no code implementations • 5 Dec 2023 • Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li, Xihui Liu
In this paper, we propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video.
no code implementations • 24 Nov 2023 • Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
We introduce Animate124 (Animate-one-image-to-4D), the first work to animate a single in-the-wild image into 3D video through textual motion descriptions, an underexplored problem with significant applications.
no code implementations • 16 Oct 2023 • Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.
no code implementations • 10 Oct 2023 • Kaican Li, Yifan Zhang, Lanqing Hong, Zhenguo Li, Nevin L. Zhang
This indicates that while pre-trained representations may help improve downstream in-distribution performance, they could have minimal or even adverse effects on generalization in certain OOD scenarios of the downstream task if not used properly.
no code implementations • 9 Oct 2023 • Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James Kwok
Subsequently, the model is optimized to identify and disentangle this information, which is then adopted as negative prompts during generation.
no code implementations • 4 Oct 2023 • Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-yan Yeung, Qiang Xu
Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control.
no code implementations • 3 Oct 2023 • Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok
Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.
no code implementations • 2 Oct 2023 • Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao
Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos.
1 code implementation • 1 Oct 2023 • Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang
Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47. 1% to 50. 4%.
Ranked #1 on Automated Theorem Proving on miniF2F-test (Pass@100 metric)
3 code implementations • 30 Sep 2023 • Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li
We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.
1 code implementation • 27 Sep 2023 • Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li
In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages.
Ranked #1 on Automated Theorem Proving on miniF2F-test (Pass@100 metric)
1 code implementation • 21 Sep 2023 • Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu
Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.
Ranked #57 on Arithmetic Reasoning on GSM8K (using extra training data)
1 code implementation • NeurIPS 2023 • Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality.
Ranked #20 on Image Generation on ImageNet 512x512
1 code implementation • ICCV 2023 • Yutao Hu, Qixiong Wang, Wenqi Shao, Enze Xie, Zhenguo Li, Jungong Han, Ping Luo
In this paper, we address this issue from two perspectives.
no code implementations • 15 Aug 2023 • Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok
Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.
3 code implementations • ICCV 2023 • Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, LiWei Wang
Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems.
Ranked #8 on 3D Object Detection on nuScenes
no code implementations • 13 Jul 2023 • Nevin L. Zhang, Kaican Li, Han Gao, Weiyan Xie, Zhi Lin, Zhenguo Li, Luning Wang, Yongxiang Huang
Domain generalization (DG) is about learning models that generalize well to new domains that are related to, but different from, the training domain(s).
1 code implementation • NeurIPS 2023 • Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu
Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene.
no code implementations • 5 Jul 2023 • Jingwei Zhang, Han Shi, Jincheng Yu, Enze Xie, Zhenguo Li
Generative models can be categorized into two types: explicit generative models that define explicit density forms and allow exact likelihood inference, such as score-based diffusion models (SDMs) and normalizing flows; implicit generative models that directly learn a transformation from the prior to the data distribution, such as generative adversarial nets (GANs).
no code implementations • 4 Jul 2023 • Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.
1 code implementation • NeurIPS 2023 • Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li
Recent Diffusion Transformers (e. g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images.
Ranked #1 on Point Cloud Generation on ShapeNet Car
no code implementations • 7 Jun 2023 • Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-yan Yeung
Diffusion models have attracted significant attention due to the remarkable ability to create content and generate data for tasks like image classification.
no code implementations • 5 Jun 2023 • Yimeng Chen, Tianyang Hu, Fengwei Zhou, Zhenguo Li, ZhiMing Ma
The proliferation of pretrained models, as a result of advancements in pretraining techniques, has led to the emergence of a vast zoo of publicly available models.
1 code implementation • NeurIPS 2023 • Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
To demonstrate the effectiveness and universality of Diff-Instruct, we consider two scenarios: distilling pre-trained diffusion models and refining existing GAN models.
no code implementations • 24 May 2023 • Mingyang Yi, Jiacheng Sun, Zhenguo Li
To understand this contradiction, we empirically verify the difference between the sufficiently trained diffusion model and the empirical optima.
1 code implementation • 18 May 2023 • Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu
In this paper, we propose ConsistentNeRF, a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels.
no code implementations • 15 May 2023 • Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
The text-driven image and video diffusion models have achieved unprecedented success in generating realistic and diverse content.
1 code implementation • 9 May 2023 • Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi
In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.
1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
1 code implementation • 19 Apr 2023 • Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li
The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability.
Ranked #4 on Math Word Problem Solving on SVAMP
1 code implementation • ICCV 2023 • Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains.
Efficient Diffusion Personalization parameter-efficient fine-tuning
no code implementations • CVPR 2023 • Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu
This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).
Ranked #5 on Object Detection on ODinW Full-Shot 13 Tasks
no code implementations • 3 Apr 2023 • Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo
In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms.
no code implementations • 1 Apr 2023 • Rui Sun, Fengwei Zhou, Zhenhua Dong, Chuanlong Xie, Lanqing Hong, Jiawei Li, Rui Zhang, Zhen Li, Zhenguo Li
By adjusting the perturbation strength in the direction of the paths, our proposed augmentation is controllable and auditable.
1 code implementation • CVPR 2023 • Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung
Specifically, our MixedAE outperforms MAE by +0. 3% accuracy, +1. 7 mIoU and +0. 9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base.
1 code implementation • ICCV 2023 • Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
Ranked #3 on Monocular Depth Estimation on SUN-RGBD
no code implementations • CVPR 2023 • Hao Yang, Lanqing Hong, Aoxue Li, Tianyang Hu, Zhenguo Li, Gim Hee Lee, LiWei Wang
In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to produce sharper but less accurate volume densities.
1 code implementation • 22 Feb 2023 • Yikai Wang, Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Wei zhang, Yanwei Fu
In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions.
no code implementations • ICCV 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
1 code implementation • 24 Dec 2022 • Feng Xue, Zi He, Chuanlong Xie, Falong Tan, Zhenguo Li
This advance raises a natural question: Can we leverage the diversity of multiple pre-trained models to improve the performance of post hoc detection methods?
no code implementations • 31 Oct 2022 • Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, Xuming He
In this work, we focus on learning a VLP model with sequential chunks of image-text pair data.
no code implementations • 17 Oct 2022 • Longhui Yu, Yifan Zhang, Lanqing Hong, Fei Chen, Zhenguo Li
Specifically, DucTeacher consists of two curriculums, i. e., (1) domain evolving curriculum seeks to learn from the data progressively to handle data distribution discrepancy by estimating the similarity between domains, and (2) distribution matching curriculum seeks to estimate the class distribution for each unlabeled domain to handle class distribution shifts.
no code implementations • 17 Oct 2022 • Qishi Dong, Awais Muhammad, Fengwei Zhou, Chuanlong Xie, Tianyang Hu, Yongxin Yang, Sung-Ho Bae, Zhenguo Li
We evaluate our paradigm on a diverse model zoo consisting of 35 models for various OoD tasks and demonstrate: (i) model ranking is better correlated with fine-tuning ranking than previous methods and up to 9859x faster than brute-force fine-tuning; (ii) OoD generalization after model ensemble with feature selection outperforms the state-of-the-art methods and the accuracy on most challenging task DomainNet is improved from 46. 5\% to 50. 6\%.
1 code implementation • 9 Oct 2022 • Haiyang Wang, Lihe Ding, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, LiWei Wang
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Ranked #1 on 3D Object Detection on SUN-RGBD
no code implementations • 20 Sep 2022 • Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu
We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.
1 code implementation • 14 Sep 2022 • Kaichen Zhou, Lanqing Hong, Changhao Chen, Hang Xu, Chaoqiang Ye, Qingyong Hu, Zhenguo Li
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
no code implementations • 14 Jul 2022 • Zhou Liu, YuJun Li, Zhengying Liu, Lin Li, Zhenguo Li
We define the normalized form of trigonometric identities, design a set of rules for the proof and put forward a method which can generate theoretically infinite trigonometric identities.
no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma
The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.
no code implementations • CVPR 2022 • Ning Kang, Shanzhao Qiu, Shifeng Zhang, Zhenguo Li, Shutao Xia
Generative model based image lossless compression algorithms have seen a great success in improving compression ratio.
1 code implementation • 8 Jun 2022 • Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo
In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.
no code implementations • 26 May 2022 • Zhili Liu, Jianhua Han, Lanqing Hong, Hang Xu, Kai Chen, Chunjing Xu, Zhenguo Li
On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong
In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.
1 code implementation • CVPR 2022 • Minbin Huang, Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang
It is able to find top 0. 16\% and 0. 29\% architectures on average on two search spaces under the budget of only 50 models.
1 code implementation • CVPR 2022 • Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Chunjing Xu, Yanwei Fu
Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical application.
1 code implementation • ICLR 2022 • Shoukang Hu, Ruochen Wang, Lanqing Hong, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng
Efficient performance estimation of architectures drawn from large search spaces is essential to Neural Architecture Search.
2 code implementations • ICLR 2022 • Xingchen Wan, Binxin Ru, Pedro M. Esperança, Zhenguo Li
Searching for the architecture cells is a dominant paradigm in NAS.
no code implementations • 15 Mar 2022 • Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu
One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.
no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
1 code implementation • ICLR 2022 • Liyuan Wang, Xingxing Zhang, Kuo Yang, Longhui Yu, Chongxuan Li, Lanqing Hong, Shifeng Zhang, Zhenguo Li, Yi Zhong, Jun Zhu
In this work, we propose memory replay with data compression (MRDC) to reduce the storage cost of old training samples and thus increase their amount that can be stored in the memory buffer.
1 code implementation • 27 Jan 2022 • Weijun Hong, Guilin Li, Weinan Zhang, Ruiming Tang, Yunhe Wang, Zhenguo Li, Yong Yu
Neural architecture search (NAS) has shown encouraging results in automating the architecture design.
no code implementations • CVPR 2022 • Aoxue Li, Peng Yuan, Zhenguo Li
Semi-Supervised object detection (SSOD) aims to improve the generalization ability of object detectors with large-scale unlabeled images.
no code implementations • CVPR 2022 • Sarah Parisot, Pedro M. Esperanca, Steven McDonagh, Tamas J. Madarasz, Yongxin Yang, Zhenguo Li
In this work, we introduce a novel strategy for long-tail recognition that addresses the tail classes' few-shot problem via training-free knowledge transfer.
no code implementations • 10 Dec 2021 • Qi Sun, Hexin Dong, Zewei Chen, Jiacheng Sun, Zhenguo Li, Bin Dong
Gradient-based methods for the distributed training of residual networks (ResNets) typically require a forward pass of the input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming as the network goes deeper.
no code implementations • 7 Dec 2021 • Tianyang Hu, Jun Wang, Wenjia Wang, Zhenguo Li
Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.
1 code implementation • NeurIPS 2021 • Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li
Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency.
no code implementations • NeurIPS 2021 • Muhammad Awais, Fengwei Zhou, Chuanlong Xie, Jiawei Li, Sung-Ho Bae, Zhenguo Li
First, we theoretically show the transferability of robustness from an adversarially trained teacher model to a student model with the help of mixup augmentation.
1 code implementation • ICLR 2022 • Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu
In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.
1 code implementation • 5 Nov 2021 • Chenxu Zhu, Bo Chen, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu
To address these three issues mentioned above, we propose Automatic Interaction Machine (AIM) with three core components, namely, Feature Interaction Search (FIS), Interaction Function Search (IFS) and Embedding Dimension Search (EDS), to select significant feature interactions, appropriate interaction functions and necessary embedding dimensions automatically in a unified framework.
no code implementations • NeurIPS 2021 • Chen Zhang, Shifeng Zhang, Fabio Maria Carlucci, Zhenguo Li
To eliminate the requirement of saving separate models for different target datasets, we propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch.
no code implementations • NeurIPS 2021 • Shifeng Zhang, Ning Kang, Tom Ryder, Zhenguo Li
In this paper, we discuss lossless compression using normalizing flows which have demonstrated a great capacity for achieving high compression ratios.
Ranked #1 on Image Compression on ImageNet32
no code implementations • ICLR 2022 • Yao Zhu, Jiacheng Sun, Zhenguo Li
Adversarial transferability enables attackers to generate adversarial examples from the source model to attack the target model, which has raised security concerns about the deployment of DNNs in practice.
no code implementations • ICLR 2022 • Xiaojiang Yang, Yi Wang, Jiacheng Sun, Xing Zhang, Shifeng Zhang, Zhenguo Li, Junchi Yan
Nonlinear ICA is a fundamental problem in machine learning, aiming to identify the underlying independent components (sources) from data which is assumed to be a nonlinear function (mixing function) of these sources.
no code implementations • NeurIPS Workshop ImageNet_PPF 2021 • Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng
Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.
no code implementations • NeurIPS Workshop DLDE 2021 • Qi Sun, Hexin Dong, Zewei Chen, Weizhen Dian, Jiacheng Sun, Yitong Sun, Zhenguo Li, Bin Dong
Backpropagation algorithm is indispensable for training modern residual networks (ResNets) and usually tends to be time-consuming due to its inherent algorithmic lockings.
no code implementations • 13 Sep 2021 • Kaichen Zhou, Lanqing Hong, Shoukang Hu, Fengwei Zhou, Binxin Ru, Jiashi Feng, Zhenguo Li
In view of these, we propose DHA, which achieves joint optimization of Data augmentation policy, Hyper-parameter and Architecture.
1 code implementation • ICCV 2021 • Haoyue Bai, Fengwei Zhou, Lanqing Hong, Nanyang Ye, S. -H. Gary Chan, Zhenguo Li
In this work, we propose robust Neural Architecture Search for OoD generalization (NAS-OoD), which optimizes the architecture with respect to its performance on generated OoD data by gradient descent.
Ranked #1 on Domain Generalization on NICO Vehicle
no code implementations • ICCV 2021 • Muhammad Awais, Fengwei Zhou, Hang Xu, Lanqing Hong, Ping Luo, Sung-Ho Bae, Zhenguo Li
Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain with deep models.
1 code implementation • ICCV 2021 • Kai Chen, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung
By pre-training on SODA10M, a large-scale autonomous driving dataset, MultiSiam exceeds the ImageNet pre-trained MoCo-v2, demonstrating the potential of domain-specific pre-training.
no code implementations • ICCV 2021 • Yao Zhu, Jiacheng Ma, Jiacheng Sun, Zewei Chen, Rongxin Jiang, Zhenguo Li
We find that adversarial training contributes to obtaining an energy function that is flat and has low energy around the real data, which is the key for generative capability.
no code implementations • ICCV 2021 • Lewei Yao, Renjie Pi, Hang Xu, Wei zhang, Zhenguo Li, Tong Zhang
In this paper, we investigate the knowledge distillation (KD) strategy for object detection and propose an effective framework applicable to both homogeneous and heterogeneous student-teacher pairs.
no code implementations • ICCV 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.
no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li
Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.
Ranked #10 on Semantic Textual Similarity on MRPC
1 code implementation • 21 Jun 2021 • Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu
To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
no code implementations • 21 Jun 2021 • Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu
Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.
1 code implementation • CVPR 2021 • Nanyang Ye, Jingxuan Tang, Huayu Deng, Xiao-Yun Zhou, Qianxiao Li, Zhenguo Li, Guang-Zhong Yang, Zhanxing Zhu
To the best of our knowledge, this is one of the first to adopt differentiable environment splitting method to enable stable predictions across environments without environment index information, which achieves the state-of-the-art performance on datasets with strong spurious correlation, such as Colored MNIST.
no code implementations • CVPR 2021 • Aoxue Li, Zhenguo Li
To this end, we propose a simple yet effective Transformation Invariant Principle (TIP) that can be flexibly applied to various meta-learning models for boosting the detection performance on novel class objects.
1 code implementation • 15 Jun 2021 • Han-Jia Ye, Da-Wei Zhou, Lanqing Hong, Zhenguo Li, Xiu-Shen Wei, De-Chuan Zhan
To this end, we propose Learning to Decompose Network (LeadNet) to contextualize the meta-learned ``support-to-target'' strategy, leveraging the context of instances with one or mixed latent attributes in a support set.
no code implementations • NeurIPS 2021 • Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, LiWei Wang
We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features.
no code implementations • CVPR 2021 • Lewei Yao, Renjie Pi, Hang Xu, Wei zhang, Zhenguo Li, Tong Zhang
For student morphism, weight inheritance strategy is adopted, allowing the student to flexibly update its architecture while fully utilize the predecessor's weights, which considerably accelerates the search; To facilitate dynamic distillation, an elastic teacher pool is trained via integrated progressive shrinking strategy, from which teacher detectors can be sampled without additional cost in subsequent searches.
2 code implementations • CVPR 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li
While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.
no code implementations • 13 May 2021 • Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo
To address this problem, we develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which can stochastically discard unimportant channels by modeling the probability of a channel being activated.
no code implementations • ICLR 2022 • Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng
Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.
4 code implementations • CVPR 2021 • Shifeng Zhang, Chen Zhang, Ning Kang, Zhenguo Li
We also propose a lossless compression algorithm based on iVPF.
no code implementations • 8 Mar 2021 • Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
Unsupervised pre-training aims at learning transferable features that are beneficial for downstream tasks.
1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok
A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.
1 code implementation • ICLR 2021 • Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li
For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.
2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo
Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.
no code implementations • 21 Jan 2021 • Haotian Ye, Chuanlong Xie, Yue Liu, Zhenguo Li
One of the definitions of OOD accuracy is worst-domain accuracy.