no code implementations • 15 Nov 2024 • Xiaodong Chen, Yuxuan Hu, Jing Zhang, Xiaokang Zhang, Cuiping Li, Hong Chen
Despite this, post-training after pruning is crucial for performance recovery and can be resource-intensive.
no code implementations • 1 Oct 2024 • Mengxue Qu, Xiaodong Chen, Wu Liu, Alicia Li, Yao Zhao
Video Temporal Grounding (VTG) aims to ground specific segments within an untrimmed video corresponding to the given natural language query.
no code implementations • 23 Jul 2024 • Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei
With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs.
no code implementations • 28 Mar 2024 • Xiaodong Chen, Yuxuan Hu, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen
This paper introduces LLM-Streamline, a pioneer work on layer pruning for large language models (LLMs).
no code implementations • 5 Feb 2024 • Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao
In practice, users often desire the ability to control object motion and camera movement independently for customized video creation.
no code implementations • 18 Jan 2024 • Siyu Ren, Junhui Hou, Xiaodong Chen, Hongkai Xiong, Wenping Wang
We then transfer the discrepancy between two 3D geometric models as the discrepancy between their DDFs defined on an identical domain, naturally establishing model correspondence.
1 code implementation • 11 Oct 2023 • Shiyuan Yang, Xiaodong Chen, Jing Liao
Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have demonstrated impressive image generation capabilities and have also been successfully applied to image inpainting.
1 code implementation • 31 Aug 2023 • Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency.
no code implementations • 25 Jun 2023 • Zhoufutu Wen, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wei Jia, Xiaodong Chen, Shuanglong Li, Lin Liu
The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling.
1 code implementation • ICCV 2023 • Siyu Ren, Junhui Hou, Xiaodong Chen, Ying He, Wenping Wang
We present a learning-based method, namely GeoUDF, to tackle the long-standing and challenging problem of reconstructing a discrete surface from a sparse point cloud. To be specific, we propose a geometry-guided learning method for UDF and its gradient estimation that explicitly formulates the unsigned distance of a query point as the learnable affine averaging of its distances to the tangent planes of neighboring points on the surface.
no code implementations • 1 Sep 2022 • Xiaodong Chen, Wu Liu, Xinchen Liu, Yongdong Zhang, Jungong Han, Tao Mei
In DestFormer, the spatial and temporal dimensions of the 4D point cloud videos are decoupled to achieve efficient self-attention for learning both long-term and short-term features.
1 code implementation • 12 Jul 2022 • Siyu Ren, Yiming Zeng, Junhui Hou, Xiaodong Chen
Motivated by the intuition that the critical step of localizing a 2D image in the corresponding 3D point cloud is establishing 2D-3D correspondence between them, we propose the first feature-based dense correspondence framework for addressing the image-to-point cloud registration problem, dubbed CorrI2P, which consists of three modules, i. e., feature embedding, symmetric overlapping region detection, and pose estimation through the established correspondence.
Ranked #1 on Image to Point Cloud Registration on KITTI
no code implementations • 9 Mar 2022 • Xiaodong Chen, Xinchen Liu, Wu Liu, Kun Liu, Dong Wu, Yongdong Zhang, Tao Mei
Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
no code implementations • 7 Oct 2021 • Xiaodong Chen, Xinchen Liu, Kun Liu, Wu Liu, Tao Mei
This technical report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.
1 code implementation • ICCV 2021 • Xiaodong Chen, Xinchen Liu, Wu Liu, Xiao-Ping Zhang, Yongdong Zhang, Tao Mei
In this paper, we propose a post-hoc method, named Attribute-guided Metric Distillation (AMD), to explain existing ReID models.
Ranked #27 on Person Re-Identification on Market-1501
1 code implementation • 3 Dec 2020 • Vijay Janapa Reddi, David Kanter, Peter Mattson, Jared Duke, Thai Nguyen, Ramesh Chukka, Ken Shiring, Koan-Sin Tan, Mark Charlebois, William Chou, Mostafa El-Khamy, Jungwook Hong, Tom St. John, Cindy Trinh, Michael Buch, Mark Mazumder, Relia Markovic, Thomas Atta, Fatih Cakir, Masoud Charkhabi, Xiaodong Chen, Cheng-Ming Chiang, Dave Dexter, Terry Heo, Gunther Schmuelling, Maryam Shabani, Dylan Zika
This paper presents the first industry-standard open-source machine learning (ML) benchmark to allow perfor mance and accuracy evaluation of mobile devices with different AI chips and software stacks.