no code implementations • 6 Jan 2025 • Jiexi Zhong, Zhiheng Li, Yubo Cui, Zheng Fang
Semantic segmentation of LiDAR points has significant value for autonomous driving and mobile robot systems.
no code implementations • 16 Dec 2024 • Hao Li, Shamit Lal, Zhiheng Li, Yusheng Xie, Ying Wang, Yang Zou, Orchid Majumder, R. Manmatha, Zhuowen Tu, Stefano Ermon, Stefano Soatto, Ashwin Swaminathan
We empirically study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations, including training scaled DiTs ranging from 0. 3B upto 8B parameters on datasets up to 600M images.
no code implementations • 11 Dec 2024 • Yubo Cui, Zhiheng Li, Jiaqiang Wang, Zheng Fang
Nowadays, conventional methods usually project the image-based vision features to 3D space and learn the geometric information through the attention mechanism, enabling the 3D semantic occupancy prediction.
no code implementations • 21 Sep 2024 • Yuxuan Zhu, Shiyi Wang, Wenqing Zhong, Nianchen Shen, Yunqi Li, Siqi Wang, Zhiheng Li, Cathy Wu, Zhengbing He, Li Li
We further analyze the potential limitations and challenges that LLMs may encounter in promoting the development of AD technology.
no code implementations • 6 Aug 2024 • Honghao Liao, Zhiheng Li, Ziyu Meng, Ran Song, Yibin Li, Wei zhang
The expanding applications of legged robots require their mastery of versatile motion skills.
1 code implementation • 25 Jul 2024 • Zhiheng Li, Yubo Cui, Jiexi Zhong, Zheng Fang
Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior of moving objects and adopted to enhance current inference by temporal fusion.
no code implementations • 17 Jul 2024 • Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li
By establishing physical causality from actions (cause) to trajectories (effect) through the kinematic model, KiGRAS eliminates massive redundant trajectories.
no code implementations • 2 Jul 2024 • Shuo Li, Yubo Cui, Zhiheng Li, Zheng Fang
Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance.
no code implementations • 28 Jun 2024 • Jiaheng Zhuang, Guoan Wang, Siyu Zhang, Xiyang Wang, Hangning Zhou, Ziyao Xu, Chi Zhang, Zhiheng Li
3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems.
no code implementations • CVPR 2024 • Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng
More specifically, MARS is collected with a fleet of autonomous vehicles driving within a certain geographical area.
no code implementations • 5 Jun 2024 • Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang
Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption.
1 code implementation • 22 May 2024 • Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.
no code implementations • 17 May 2024 • Pengzhi Li, Chengshuai Tang, QInxuan Huang, Zhiheng Li
In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques.
no code implementations • 8 Apr 2024 • Zhiqi Huang, Huixin Xiong, Haoyu Wang, Longguang Wang, Zhiheng Li
Then, the object images are employed as additional prompts to facilitate the diffusion model to better understand the relationship between foreground and background regions during image generation.
no code implementations • CVPR 2024 • Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng
In this work, we introduce Fair Retrieval Augmented Generation (FairRAG), a novel framework that conditions pre-trained generative models on reference images retrieved from an external image database to improve fairness in human generation.
no code implementations • 29 Mar 2024 • Pengzhi Li, Baijuan Li, Zhiheng Li
Recently, the development of large-scale models has paved the way for various interdisciplinary research, including architecture.
2 code implementations • CVPR 2024 • Zeliang Zhang, Mingqian Feng, Zhiheng Li, Chenliang Xu
Discovering biased subgroups is the key to understanding models' failure modes and further improving models' robustness.
1 code implementation • 16 Mar 2024 • Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou
The appearance embedding models the characteristics of low-resolution inputs to deal with photometric variations at different scales, and the pixel-based deformation field learns RGB differences which result from the deviations between the real-world and simulated degradations at arbitrary coordinates.
1 code implementation • 26 Feb 2024 • Yu Lin, Zhiheng Li, Yubo Cui, Zheng Fang
Most existing methods perform tracking between two consecutive frames while ignoring the motion patterns of the target over a series of frames, which would cause performance degradation in the scenes with sparse points.
no code implementations • 28 Dec 2023 • Jipeng Jin, Zhaoxiang Zhang, Zhiheng Li, Xiaofeng Gao, Xiongwen Yang, Lei Xiao, Jie Jiang
Considering recency effect in memories, we propose a forgetting model based on Ebbinghaus Forgetting Curve to cope with negative feedback.
2 code implementations • 27 Nov 2023 • Huanjin Yao, Wenhao Wu, Zhiheng Li
In this paper, we present a novel Spatial-Temporal Side Network for memory-efficient fine-tuning large image models to video understanding, named Side4Video.
Ranked #3 on
Action Recognition
on Something-Something V1
1 code implementation • 14 Nov 2023 • GuanYu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang
Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems.
1 code implementation • 14 Nov 2023 • GuanYu Lin, Chen Gao, Yu Zheng, Yinfeng Li, Jianxin Chang, Yanan Niu, Yang song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li
In this paper, we propose a meta-learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises in both positive and negative instances.
no code implementations • 13 Oct 2023 • Lu Li, Yuxin Pan, RuoBing Chen, Jie Liu, Zilin Wang, Yu Liu, Zhiheng Li
Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations.
1 code implementation • ICCV 2023 • Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou
By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space.
2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.
1 code implementation • 23 Aug 2023 • Zhiheng Li, Yu Lin, Yubo Cui, Shuo Li, Zheng Fang
3D single object tracking with LiDAR points is an important task in the computer vision field.
no code implementations • 30 Jun 2023 • Yubo Cui, Zhiheng Li, Zheng Fang
Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box.
1 code implementation • 15 Jun 2023 • Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng
Monocular scene understanding is a foundational component of autonomous systems.
3D Semantic Scene Completion
3D Semantic Scene Completion from a single 2D image
1 code implementation • 1 Jun 2023 • Lu Li, Jiafei Lyu, Guozheng Ma, Zilin Wang, Zhenjie Yang, Xiu Li, Zhiheng Li
Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce.
no code implementations • 30 May 2023 • Pengzhi Li, QInxuan Huang, Yikang Ding, Zhiheng Li
During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description.
1 code implementation • 11 May 2023 • Zhiheng Li, Yubo Cui, Yu Lin, Zheng Fang
To overcome the limitations of geometry matching, we propose a Multi-modal Multi-level Fusion Tracker (MMF-Track), which exploits the image texture and geometry characteristic of point clouds to track 3D target.
1 code implementation • 8 Feb 2023 • GuanYu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang song, Zhiheng Li, Depeng Jin, Yong Li
In this paper, we propose Dual-interest Factorization-heads Attention for Sequential Recommendation (short for DFAR) consisting of feedback-aware encoding layer, dual-interest disentangling layer and prediction layer.
no code implementations • 12 Dec 2022 • Raghav Mehta, Vítor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser, Zhiheng Li, Tal Hassner
With experiments on a wide range of pre-trained models and pre-training datasets, we show that the capacity of the pre-training model and the size of the pre-training dataset matters.
1 code implementation • CVPR 2023 • Zhiheng Li, Ivan Evtimov, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian Canton Ferrer, Chenliang Xu, Mark Ibrahim
Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i. e., where mitigating one shortcut amplifies reliance on others.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • 7 Dec 2022 • Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma
This is a brief technical report of our proposed method for Multiple-Object Tracking (MOT) Challenge in Complex Environments.
Ranked #8 on
Multi-Object Tracking
on DanceTrack
(using extra training data)
7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
1 code implementation • 2 Oct 2022 • Yubo Cui, Jiayao Shan, Zuoxu Gu, Zhiheng Li, Zheng Fang
Meanwhile, the encoder applies the attention on multi-scale features to compensate for the lack of information caused by the sparsity of point cloud and the single scale of features.
no code implementations • 20 Sep 2022 • Dihe Huang, Ying Chen, Yikang Ding, Jinli Liao, Jianlin Liu, Kai Wu, Qiang Nie, Yong liu, Chengjie Wang, Zhiheng Li
In MDRNet, the Spatial-aware Dimensionality Reduction (SDR) is designed to dynamically focus on the valuable parts of the object during voxel-to-BEV feature transformation.
1 code implementation • 18 Sep 2022 • GuanYu Lin, Chen Gao, Yinfeng Li, Yu Zheng, Zhiheng Li, Depeng Jin, Dong Li, Jianye Hao, Yong Li
Such user-centric recommendation will make it impossible for the provider to expose their new items, failing to consider the accordant interactions between user and item dimensions.
1 code implementation • 20 Jul 2022 • Zhiheng Li, Anthony Hoogs, Chenliang Xu
By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • 21 Jun 2022 • Yikang Ding, Zhenyang Li, Dihe Huang, Zhiheng Li, Kai Zhang
Learning-based multi-view stereo (MVS) methods have made impressive progress and surpassed traditional methods in recent years.
1 code implementation • CVPR 2022 • Zhiheng Li, Martin Renqiang Min, Kai Li, Chenliang Xu
Based on the identified latent directions of attributes, we propose Compositional Attribute Adjustment to adjust the latent code, resulting in better compositionality of image synthesis.
1 code implementation • 3 Oct 2021 • Laila Rasmy, Jie Zhu, Zhiheng Li, Xin Hao, Hong Thoai Tran, Yujia Zhou, Firat Tiryaki, Yang Xiang, Hua Xu, Degui Zhi
As a result, deep learning models developed for sequence modeling, like recurrent neural networks (RNNs) are common architecture for EHR-based clinical events predictive models.
1 code implementation • ICCV 2021 • Zhiheng Li, Chenliang Xu
To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute.
2 code implementations • CVPR 2021 • Tianjiao Li, Jun Liu, Wei zhang, Yun Ni, Wenqian Wang, Zhiheng Li
Human behavior understanding with unmanned aerial vehicles (UAVs) is of great significance for a wide range of applications, which simultaneously brings an urgent demand of large, challenging, and comprehensive benchmarks for the development and evaluation of UAV-based models.
1 code implementation • 1 Aug 2020 • Jing Shi, Zhiheng Li, Haitian Zheng, Yihang Xu, Tianyou Xiao, Weitao Tan, Xiaoning Guo, Sizhe Li, Bin Yang, Zhexin Xu, Ruitao Lin, Zhongkai Shangguan, Yue Zhao, Jingwen Wang, Rohan Sharma, Surya Iyer, Ajinkya Deshmukh, Raunak Mahalik, Srishti Singh, Jayant G Rohra, Yi-Peng Zhang, Tongyu Yang, Xuan Wen, Ethan Fahnestock, Bryce Ikeda, Ian Lawson, Alan Finkelstein, Kehao Guo, Richard Magnotti, Andrew Sexton, Jeet Ketan Thaker, Yiyang Su, Chenliang Xu
This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester
2 code implementations • 24 Jun 2020 • Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, Andrew D. White
The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation.
1 code implementation • 5 Jun 2020 • Ming Zhang, Yawei Wang, Xiaoteng Ma, Li Xia, Jun Yang, Zhiheng Li, Xiu Li
The generative adversarial imitation learning (GAIL) has provided an adversarial learning framework for imitating expert policy from demonstrations in high-dimensional continuous tasks.
no code implementations • CVPR 2020 • Jie Chen, Zhiheng Li, Jiebo Luo, Chenliang Xu
Instead of blindly trusting quality-inconsistent PAs, WS^2 employs a learning-based selection to select effective PAs and a novel region integrity criterion as a stopping condition for weakly-supervised training.
no code implementations • CVPR 2020 • Zhiheng Li, Wenxuan Bao, Jiayang Zheng, Chenliang Xu
The perceptual-based grouping process produces a hierarchical and compositional image representation that helps both human and machine vision systems recognize heterogeneous visual concepts.
no code implementations • 6 Oct 2019 • Zhiheng Li, Xinyue Xing, Bingzhang Lu, Zhixiang Li
ICU readmission is associated with longer hospitalization, mortality and adverse outcomes.
no code implementations • 20 Jun 2019 • Guan Wang, Jianming Hu, Zhiheng Li, Li Li
In this paper, we study how to learn an appropriate lane changing strategy for autonomous vehicles by using deep reinforcement learning.
1 code implementation • ECCV 2018 • Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.