no code implementations • 16 Jan 2025 • Yiming Liang, Tianyu Zheng, Xinrun Du, Ge Zhang, Xingwei Qu, Xiang Yue, Chujie Zheng, Jiaheng Liu, Lei Ma, Wenhu Chen, Guoyin Wang, Zhaoxiang Zhang, Wenhao Huang, Jiajun Zhang
Instruction tuning enhances large language models (LLMs) to follow human instructions across diverse tasks, relying on high-quality datasets to guide behavior.
1 code implementation • 16 Jan 2025 • Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang
(ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal.
no code implementations • 14 Jan 2025 • Yuxue Yang, Lue Fan, Zuzen Lin, Feng Wang, Zhaoxiang Zhang
In this paper, we introduce LayerAnimate, a novel architectural approach that enhances fine-grained control over individual animation layers within a video diffusion model, allowing users to independently manipulate foreground and background elements in distinct layers.
no code implementations • 24 Dec 2024 • Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang
World model-based searching and planning are widely recognized as a promising path toward human-level physical intelligence.
no code implementations • 4 Dec 2024 • Lue Fan, Hao Zhang, Qitai Wang, Hongsheng Li, Zhaoxiang Zhang
We propose FreeSim, a camera simulation method for autonomous driving.
no code implementations • 27 Nov 2024 • Yixuan Zhang, Hui Yang, Chuanchen Luo, Junran Peng, Yuxi Wang, Zhaoxiang Zhang
Generating realistic 3D human-object interactions (HOIs) from text descriptions is a active research topic with potential applications in virtual and augmented reality, robotics, and animation.
1 code implementation • 27 Nov 2024 • Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Qifeng Chen, Zhaoxiang Zhang
To this end, this work presents a simple and effective framework, SimCMF, to study an important problem: cross-modal fine-tuning from vision foundation models trained on natural RGB images to other imaging modalities of different physical properties (e. g., polarization).
no code implementations • 25 Nov 2024 • Xiangyu Zhu, Chang Yu, Jiankuo Zhao, Zhaoxiang Zhang, Stan Z. Li, Zhen Lei
By injecting graphics probes into neural networks, and analyzing their behavior in reconstructing images, we find that DNNs initially encode images as 2D representations in low-level layers, and finally construct 3D representations in high-level layers.
no code implementations • 7 Nov 2024 • Siming Huang, Tianhao Cheng, J. K. Liu, Jiaran Hao, Liuyihan Song, Yang Xu, J. Yang, J. H. Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Zhaoxiang Zhang, Jie Fu, Qian Liu, Ge Zhang, Zili Wang, Yuan Qi, Yinghui Xu, Wei Chu
To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
1 code implementation • 3 Nov 2024 • Yiwei Zhang, Jin Gao, Fudong Ge, Guan Luo, Bing Li, Zhaoxiang Zhang, Haibin Ling, Weiming Hu
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic.
1 code implementation • 1 Nov 2024 • Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis.
no code implementations • 30 Oct 2024 • Hongbo Zhao, Lue Fan, Yuntao Chen, Haochen Wang, Yuran Yang, Xiaojuan Jin, Yixin Zhang, Gaofeng Meng, Zhaoxiang Zhang
By publishing and maintaining the dataset, we provide a high-quality benchmark for satellite-based map construction and downstream tasks like autonomous driving.
no code implementations • 23 Oct 2024 • Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
Moreover, we propose two new challenging benchmarks tailored to driving scenes, which are novel camera synthesis and novel trajectory synthesis, emphasizing the freedom of viewpoints.
1 code implementation • 17 Oct 2024 • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu
In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i. e., math, coding, commonsense reasoning).
1 code implementation • 15 Oct 2024 • Pei Wang, Yanan Wu, Zekun Wang, Jiaheng Liu, Xiaoshuai Song, Zhongyuan Peng, Ken Deng, Chenchen Zhang, Jiakai Wang, Junran Peng, Ge Zhang, Hangyu Guo, Zhaoxiang Zhang, Wenbo Su, Bo Zheng
Besides, all evaluation metrics of our MTU-Bench are based on the prediction results and the ground truth without using any GPT or human evaluation metrics.
no code implementations • 14 Oct 2024 • Yuqi Wang, Ke Cheng, JiaWei He, Qitai Wang, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang
Driving world models have gained increasing attention due to their ability to model complex physical dynamics.
no code implementations • 12 Oct 2024 • Haochen Wang, Anlin Zheng, Yucheng Zhao, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Zhaoxiang Zhang
This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
1 code implementation • 26 Sep 2024 • Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang
In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner.
1 code implementation • 24 Sep 2024 • Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen
Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text.
no code implementations • 23 Sep 2024 • Yizhi Li, Ge Zhang, Yinghao Ma, Ruibin Yuan, Kang Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Zhaoxiang Zhang, Zachary Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin
Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities.
1 code implementation • 12 Sep 2024 • Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang
To this end, this work presents a simple and effective framework SimMAT to study an open problem: the transferability from vision foundation models trained on natural RGB images to other image modalities of different physical properties (e. g., polarization).
1 code implementation • 29 Aug 2024 • Zengjie Song, Jiangshe Zhang, Yuxi Wang, Junsong Fan, Zhaoxiang Zhang
To address this issue, we propose a novel audio-visual learning framework which is instantiated with two individual learning schemes: self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL).
no code implementations • 24 Jul 2024 • Shougao Zhang, Mengqi Zhou, Yuxi Wang, Chuanchen Luo, Rongyu Wang, Yiwei Li, Zhaoxiang Zhang, Junran Peng
With the surge of embodied intelligence, recent years have witnessed an increasing presence of physical agents in urban areas, such as autonomous vehicles and delivery robots.
1 code implementation • 18 Jul 2024 • Guowen Zhang, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
However, the annotation of large-scale 3D datasets requires significant human effort.
no code implementations • 18 Jul 2024 • Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation.
1 code implementation • 16 Jul 2024 • Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
With a dataset size 40 times larger than the NYUv2 dataset, it facilitates future scalable research in indoor scene analysis.
1 code implementation • 15 Jun 2024 • Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxiang Zhang, Lei Zhang
Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence.
1 code implementation • 12 Jun 2024 • Yingyan Li, Lue Fan, JiaWei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan
Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
no code implementations • 11 Jun 2024 • Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang
Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details.
1 code implementation • 3 Jun 2024 • Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang
However, incorporating the raw user guidance naively into the existing reflection removal network does not result in performance gains.
no code implementations • 17 May 2024 • Junhong Zou, Xiangyu Zhu, Zhaoxiang Zhang, Zhen Lei
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes, which is crucial for interpretable visual comprehension and reasoning.
no code implementations • 9 May 2024 • Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng
The effect of the design of each component is still unclear.
no code implementations • 9 May 2024 • Xulu Zhang, Xiao-Yong Wei, WengYu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li
This paper offers a comprehensive survey of PCS, with a particular focus on the diffusion models.
no code implementations • 8 May 2024 • Zhaoxiang Zhang, Hanqiu Deng, Jinan Bao, Xingyu Li
Image Anomaly Detection has been a challenging task in Computer Vision field.
1 code implementation • 6 May 2024 • Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.
no code implementations • 22 Apr 2024 • Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, XuCheng Yin, Zhaoxiang Zhang, Junran Peng
Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance.
no code implementations • CVPR 2024 • Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei
Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects.
1 code implementation • 1 Apr 2024 • Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Naiyan Wang, Junran Peng, Zhaoxiang Zhang
The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS).
no code implementations • CVPR 2024 • Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang
Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head.
no code implementations • 23 Mar 2024 • Mengqi Zhou, Yuxi Wang, Jun Hou, Shougao Zhang, Yiwei Li, Chuanchen Luo, Junran Peng, Zhaoxiang Zhang
Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation, including nature scenes and unbounded cities, as well as scene editing such as asset placement and season translation.
1 code implementation • 22 Mar 2024 • Xulu Zhang, WengYu Zhang, Xiao-Yong Wei, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li
The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in discriminative models that typically target a single concept.
3 code implementations • CVPR 2024 • Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang
(i) For unwanted knowledge, efficient and effective deleting is crucial.
no code implementations • 4 Mar 2024 • Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, Cheng-Lin Liu
This paper aims to provide a comprehensive introduction to the emerging open-world machine learning paradigm, to help researchers build more powerful AI systems in their respective fields, and to promote the development of artificial general intelligence.
1 code implementation • CVPR 2024 • Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang
Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.
1 code implementation • 8 Feb 2024 • Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Chen Qian, Zhaoxiang Zhang, Zhen Lei
We suspect this is due to a shortage of paired audio-4D data, which is crucial for the Transformer to effectively perform as a denoiser within the Diffusion framework.
1 code implementation • 31 Jan 2024 • Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed.
1 code implementation • 29 Jan 2024 • Yuxue Yang, Lue Fan, Zhaoxiang Zhang
Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes.
no code implementations • 12 Jan 2024 • Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei
The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images.
no code implementations • 7 Jan 2024 • Genghao Zhang, Yuxi Wang, Chuanchen Luo, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng
Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design.
no code implementations • CVPR 2024 • Jiaqi Liao, Chuanchen Luo, Yinuo Du, Yuxi Wang, XuCheng Yin, Man Zhang, Zhaoxiang Zhang, Junran Peng
Empirically we find that the prediction failure in dance and martial arts is mainly characterized by the misalignment of hand-wrist and foot-ankle.
1 code implementation • CVPR 2024 • Fei Zhu, Zhen Cheng, Xu-Yao Zhang, Cheng-Lin Liu, Zhaoxiang Zhang
Concretely we identify the failure of simply integrating learning objectives of misclassification and OOD detection and show the potential of sequence learning.
no code implementations • 28 Dec 2023 • Jipeng Jin, Zhaoxiang Zhang, Zhiheng Li, Xiaofeng Gao, Xiongwen Yang, Lei Xiao, Jie Jiang
Considering recency effect in memories, we propose a forgetting model based on Ebbinghaus Forgetting Curve to cope with negative feedback.
1 code implementation • 21 Dec 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang
To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
1 code implementation • 13 Dec 2023 • Xulu Zhang, Xiao-Yong Wei, Jinlin Wu, Tianyi Zhang, Zhaoxiang Zhang, Zhen Lei, Qing Li
It stems from the fact that during inversion, the irrelevant semantics in the user images are also encoded, forcing the inverted concepts to occupy locations far from the core distribution in the embedding space.
1 code implementation • 7 Dec 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen
With these region-specific narratives (partial observations) and a holistic narrative (global observation) for an image, a large language model (LLM) performs the relationship reasoning to synthesize an accurate and comprehensive scene graph.
1 code implementation • CVPR 2024 • Yuqi Wang, JiaWei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang
In autonomous driving, predicting future events in advance and evaluating the foreseeable risks empowers autonomous vehicles to better plan their actions, enhancing safety and efficiency on the road.
1 code implementation • 18 Nov 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen
For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining utilizing image-caption data and retains visual-concept alignment through knowledge distillation.
no code implementations • 11 Nov 2023 • Zongzhao Li, Xiangyu Zhu, Xi Zhang, Zhaoxiang Zhang, Zhen Lei
Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network.
2 code implementations • 1 Oct 2023 • Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng
The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.
no code implementations • ICCV 2023 • Yuxi Wang, Jian Liang, Jun Xiao, Shuqi Mei, Yuran Yang, Zhaoxiang Zhang
One-shot domain adaptation methods attempt to overcome these challenges by transferring the pre-trained source model to the target domain using only one target data.
1 code implementation • NeurIPS 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tong Wang, Zhaoxiang Zhang
As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident.
1 code implementation • 30 Aug 2023 • Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, Xingyu Li
On top of the proposed AnoCLIP, we further introduce a test-time adaptation (TTA) mechanism to refine visual anomaly localization results, where we optimize a lightweight adapter in the visual encoder using AnoCLIP's pseudo-labels and noise-corrupted tokens.
2 code implementations • 7 Aug 2023 • Lue Fan, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
Consequently, we develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
no code implementations • 2 Aug 2023 • Jingfan Chen, Yuxi Wang, Pengfei Wang, Xiao Chen, Zhaoxiang Zhang, Zhen Lei, Qing Li
The Class Incremental Semantic Segmentation (CISS) extends the traditional segmentation task by incrementally learning newly added classes.
1 code implementation • NeurIPS 2023 • Yang Liu, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
Radar is ubiquitous in autonomous driving systems due to its low cost and good adaptability to bad weather.
1 code implementation • ICCV 2023 • Xiaojun Tang, Junsong Fan, Chuanchen Luo, Zhaoxiang Zhang, Man Zhang, Zongyuan Yang
Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • 20 Jun 2023 • Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li
However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain.
1 code implementation • 19 Jun 2023 • Zengjie Song, Zhaoxiang Zhang
The framework of visually-guided sound source separation generally consists of three parts: visual feature extraction, multimodal feature fusion, and sound signal processing.
1 code implementation • CVPR 2024 • Yuqi Wang, Yuntao Chen, Xingyu Liao, Lue Fan, Zhaoxiang Zhang
In this work, we address this limitation by studying camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
no code implementations • 8 Jun 2023 • JiaWei He, Lue Fan, Yuqi Wang, Yuntao Chen, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang
In this paper, we rethink the data association in 2D MOT and utilize the 3D object representation to separate each object in the feature space.
no code implementations • 8 Jun 2023 • JiaWei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant.
1 code implementation • 4 Jun 2023 • Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang
A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability.
1 code implementation • 25 May 2023 • Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai
These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions.
1 code implementation • 23 May 2023 • Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang
To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.
no code implementations • 22 May 2023 • Jinglin Zhan, Tiejun Liu, RenGang Li, Jingwei Zhang, Zhaoxiang Zhang, Yuntao Chen
Data and model are the undoubtable two supporting pillars for LiDAR object detection.
1 code implementation • 24 Apr 2023 • Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang
In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.
2 code implementations • ICCV 2023 • Lue Fan, Yuxue Yang, Yiming Mao, Feng Wang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang
Drawing inspiration from this, we propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective.
1 code implementation • CVPR 2023 • Haochen Wang, Kaiyou Song, Junsong Fan, Yuxi Wang, Jin Xie, Zhaoxiang Zhang
We observe that the reconstruction loss can naturally be the metric of the difficulty of the pre-training task.
1 code implementation • CVPR 2023 • JiaWei He, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang
We explore long-term temporal visual correspondence-based optimization for 3D video object detection in this work.
1 code implementation • 27 Mar 2023 • JiaWei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang
Data association is at the core of many computer vision tasks, e. g., multiple object tracking, image matching, and point cloud registration.
no code implementations • CVPR 2023 • Chang Yu, Xiangyu Zhu, Xiaomei Zhang, Zhaoxiang Zhang, Zhen Lei
The function of constructing the hierarchy of objects is important to the visual process of the human brain.
1 code implementation • CVPR 2023 • Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.
no code implementations • 16 Mar 2023 • Wenjian Wang, Lijuan Duan, Yuxi Wang, Junsong Fan, Zhi Gong, Zhaoxiang Zhang
Research into Cross-Domain Few-Shot (CDFS) has emerged to address this issue, forming a more challenging and realistic setting.
1 code implementation • CVPR 2023 • Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen
Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.
1 code implementation • ICCV 2023 • Lin Zhang, Xin Li, Dongliang He, Errui Ding, Zhaoxiang Zhang
To this end, we construct a large-scale, multi-reference super-resolution dataset, named LMR.
no code implementations • CVPR 2023 • Qu Tang, Xiangyu Zhu, Zhen Lei, Zhaoxiang Zhang
The ability to discover abstract physical concepts and understand how they work in the world through observing lies at the core of human intelligence.
no code implementations • 16 Feb 2023 • Xiao Chen, Wenqi Fan, Jingfan Chen, Haochen Liu, Zitao Liu, Zhaoxiang Zhang, Qing Li
Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i. e., clicked by a user) and negative items (i. e., obtained by negative sampling).
1 code implementation • CVPR 2023 • Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
The transformation of features from 2D perspective space to 3D space is essential to multi-view 3D object detection.
2 code implementations • 5 Jan 2023 • Lue Fan, Yuxue Yang, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
To enable efficient long-range detection, we first propose a fully sparse object detector termed FSD.
no code implementations • CVPR 2023 • Cong Pan, Yonghao He, Junran Peng, Qian Zhang, Wei Sui, Zhaoxiang Zhang
Moreover, we find that the image feature maps' resolution in the cross-attention module has a limited effect on the final performance.
Ranked #6 on Bird's-Eye View Semantic Segmentation on nuScenes
1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang
Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
no code implementations • ICCV 2023 • Jingtao Wang, Zengjie Song, Yuxi Wang, Jun Xiao, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang
Surrogate gradient (SG) is one of the most effective approaches for training spiking neural networks (SNNs).
no code implementations • 30 Nov 2022 • Jianjin Xu, Zhaoxiang Zhang, Xiaolin Hu
Second, we train image-to-image translation networks on the synthesized datasets, enabling semantic-conditional image synthesis without human annotations.
2 code implementations • CVPR 2023 • Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai
The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
Ranked #5 on 3D Object Detection on Rope3D
no code implementations • 8 Nov 2022 • Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhaoxiang Zhang
While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as.
1 code implementation • 25 Oct 2022 • Junsong Fan, Zhaoxiang Zhang, Tieniu Tan
In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation.
1 code implementation • 10 Oct 2022 • Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
In this paper, we propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information.
no code implementations • 20 Aug 2022 • Hongxin Li, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang
To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task.
no code implementations • 28 Jul 2022 • Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan
To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
4 code implementations • 20 Jul 2022 • Lue Fan, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD).
1 code implementation • 20 Jul 2022 • Yingyan Li, Yuntao Chen, JiaWei He, Zhaoxiang Zhang
So these methods only use a small number of projection constraints and produce insufficient depth candidates, leading to inaccurate depth estimation.
1 code implementation • CVPR 2022 • Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang
Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.
1 code implementation • CVPR 2022 • Zengjie Song, Yuxi Wang, Junsong Fan, Tieniu Tan, Zhaoxiang Zhang
Sound source localization in visual scenes aims to localize objects emitting the sound in a given image.
2 code implementations • CVPR 2022 • Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu
In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
Ranked #7 on Real-time Instance Segmentation on MSCOCO
no code implementations • CVPR 2022 • Chang Yu, Xiangyu Zhu, Xiaomei Zhang, Zidu Wang, Zhaoxiang Zhang, Zhen Lei
Capsule networks are designed to present the objects by a set of parts and their relationships, which provide an insight into the procedure of visual perception.
1 code implementation • CVPR 2022 • Qing Chang, Junran Peng, Lingxie Xie, Jiajun Sun, Haoran Yin, Qi Tian, Zhaoxiang Zhang
However, due to the high training costs and the unconsciousness of downstream usages, most self-supervised learning methods lack the capability to correspond to the diversities of downstream scenarios, as there are various data domains, different vision tasks and latency constraints on models.
2 code implementations • CVPR 2022 • Renjie Zou, Chunfeng Song, Zhaoxiang Zhang
Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression.
Ranked #1 on Image Compression on kodak
no code implementations • 14 Jan 2022 • Yuqi Wang, Xu-Yao Zhang, Cheng-Lin Liu, Zhaoxiang Zhang
Moreover, through experiments we show that discrete language representation has several advantages compared with continuous feature representation, from the aspects of interpretability, generalization, and robustness.
no code implementations • CVPR 2022 • Wenjian Wang, Lijuan Duan, Yuxi Wang, Qing En, Junsong Fan, Zhaoxiang Zhang
To remedy this problem, we propose an interesting and challenging cross-domain few-shot semantic segmentation task, where the training and test tasks perform on different domains.
1 code implementation • CVPR 2022 • Chenghao Zhang, Kun Tian, Bin Fan, Gaofeng Meng, Zhaoxiang Zhang, Chunhong Pan
The deep stereo models have achieved state-of-the-art performance on driving scenes, but they suffer from severe performance degradation when tested on unseen scenes.
no code implementations • CVPR 2022 • Jing Li, Junsong Fan, Zhaoxiang Zhang
Existing methods usually generate pseudo labels from class activation map (CAM) and then train a segmentation model.
2 code implementations • CVPR 2022 • Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Ranked #3 on 3D Object Detection on waymo cyclist
1 code implementation • 26 Nov 2021 • Qitai Wang, Yuntao Chen, Ziqi Pang, Naiyan Wang, Zhaoxiang Zhang
We employ a simple Kalman filter for trajectory prediction and preserve the tracklet by prediction when the target is not visible.
no code implementations • ICLR 2022 • Qu Tang, Xiangyu Zhu, Zhen Lei, Zhaoxiang Zhang
In this paper, we work on object dynamics and propose Object Dynamics Distillation Network (ODDN), a framework that distillates explicit object dynamics (e. g., velocity) from sequential static representations.
1 code implementation • 22 Jun 2021 • Yuxi Wang, Jian Liang, Zhaoxiang Zhang
Source-free domain adaptation has developed rapidly in recent years, where the well-trained source model is adapted to the target domain instead of the source data, offering the potential for privacy concerns and intellectual property protection.
1 code implementation • CVPR 2021 • Xingyuan Bu, Junran Peng, Junjie Yan, Tieniu Tan, Zhaoxiang Zhang
Transfer learning with pre-training on large-scale datasets has played an increasingly significant role in computer vision and natural language processing recently.
1 code implementation • 28 Apr 2021 • Manyu Zhu, Dongliang He, Xin Li, Chao Li, Fu Li, Xiao Liu, Errui Ding, Zhaoxiang Zhang
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
Ranked #4 on Image Inpainting on CelebA-HQ
1 code implementation • CVPR 2021 • Zikai Zhang, Bineng Zhong, Shengping Zhang, Zhenjun Tang, Xin Liu, Zhaoxiang Zhang
A practical long-term tracker typically contains three key properties, i. e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
1 code implementation • CVPR 2021 • Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu
In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner.
2 code implementations • CVPR 2021 • Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu
Tremendous efforts have been made on instance segmentation but the mask quality is still not satisfactory.
2 code implementations • CVPR 2021 • Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang
Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.
1 code implementation • CVPR 2021 • JiaWei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang
Then the association problem turns into a general graph matching between tracklet graph and detection graph.
1 code implementation • 18 Mar 2021 • Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
The most notable difference with previous works is that our method is purely based on the range view representation.
1 code implementation • ICCV 2021 • Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
We first analyze the existing range-view-based methods and find two issues overlooked by previous works: 1) the scale variation between nearby and far away objects; 2) the inconsistency between the 2D range image coordinates used in feature extraction and the 3D Cartesian coordinates used in output.
no code implementations • ICCV 2021 • Yuxi Wang, Junran Peng, Zhaoxiang Zhang
Unsupervised domain adaptation for semantic segmentation aims to assign the pixel-level labels for unlabeled target domain by transferring knowledge from the labeled source domain.
no code implementations • ICCV 2021 • Yan Huang, Qiang Wu, Jingsong Xu, Yi Zhong, Zhaoxiang Zhang
This work argues that these approaches in fact are not aware of clothing status (i. e., change or no-change) of a pedestrian.
1 code implementation • 9 Dec 2020 • Xueyi Li, Tianfei Zhou, Jianwu Li, Yi Zhou, Zhaoxiang Zhang
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models.
Ranked #38 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)
no code implementations • CVPR 2021 • Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu
We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated.
no code implementations • 16 Nov 2020 • Zhen Yang, Chi Zhang, Huiming Guo, Zhaoxiang Zhang
In this paper, we propose a manual-label free 3D detection algorithm that leverages the CARLA simulator to generate a large amount of self-labeled training samples and introduces a novel Domain Adaptive VoxelNet (DA-VoxelNet) that can cross the distribution gap from the synthetic data to the real scenario.
no code implementations • International Joint Conference on Artificial Intelligence 2018 • Yue Xu, Fei Yin, Zhaoxiang Zhang, Cheng-Lin Liu
Layout analysis is a fundamental process in document image analysis and understanding.
no code implementations • CVPR 2016 • Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki
We name the proposed 3D shape search engine, which combines GPU acceleration and Inverted File Twice, as GIFT.