Search Results for author: Li Yi

Found 77 papers, 41 papers with code

3D-Assisted Image Feature Synthesis for Novel Views of an Object

no code implementations • 26 Nov 2014 • Hao Su, Fan Wang, Li Yi, Leonidas Guibas

In this paper, given a single input image of an object, we synthesize new features for other views of the same object.

Paper
Add Code

Learning Discriminative Representations for Semantic Cross Media Retrieval

no code implementations • 18 Nov 2015 • Jiang Aiwen, Li Hanxi, Li Yi, Wang Mingwen

As a result, an efficient linear semantic down mapping is jointly learned for multimodal data, leading to a common space where they can be compared.

Representation Learning Retrieval

Paper
Add Code

ShapeNet: An Information-Rich 3D Model Repository

14 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.

Data Visualization

65,339

Paper
Code

Dialogue Session Segmentation by Embedding-Enhanced TextTiling

no code implementations • 13 Oct 2016 • Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang

In human-computer conversation systems, the context of a user-issued utterance is particularly important because it provides useful background information of the conversation.

Word Embeddings

Paper
Add Code

SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation

no code implementations • CVPR 2017 • Li Yi, Hao Su, Xingwen Guo, Leonidas Guibas

To enable the prediction of vertex functions on them by convolutional neural networks, we resort to spectral CNN method that enables weight sharing by parameterizing kernels in the spectral domain spanned by graph laplacian eigenbases.

Ranked #54 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation

Paper
Add Code

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

no code implementations • CVPR 2018 • Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas Guibas

Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level.

Human-Object Interaction Detection Image Captioning +1

Paper
Add Code

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

64 code implementations • NeurIPS 2017 • Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas

By exploiting metric space distances, our network is able to learn local features with increasing contextual scales.

Ranked #2 on Semantic Segmentation on Toronto-3D L002

3D Part Segmentation +4

3,170

Paper
Code

Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas

We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.

3D Part Segmentation 3D Reconstruction +1

1,989

Paper
Code

Deep Part Induction from Articulated Object Pairs

1 code implementation • 19 Sep 2018 • Li Yi, Haibin Huang, Difan Liu, Evangelos Kalogerakis, Hao Su, Leonidas Guibas

In this paper, we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects.

Object

Paper
Code

Supervised Fitting of Geometric Primitives to 3D Point Clouds

2 code implementations • CVPR 2019 • Lingxiao Li, Minhyuk Sung, Anastasia Dubrovina, Li Yi, Leonidas Guibas

Fitting geometric primitives to 3D point cloud data bridges a gap between low-level digitized 3D data and high-level structural information on the underlying 3D shapes.

Shape Representation Of 3D Point Clouds

163

Paper
Code

TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes

1 code implementation • CVPR 2019 • Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, Leonidas Guibas

We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e. g., color texture maps).

Ranked #21 on Semantic Segmentation on ScanNet

3D Semantic Segmentation

Paper
Code

PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding

5 code implementations • CVPR 2019 • Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, Hao Su

We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information.

Ranked #3 on 3D Instance Segmentation on PartNet

3D Instance Segmentation 3D Semantic Segmentation +2

1,349

Paper
Code

GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud

1 code implementation • CVPR 2019 • Li Yi, Wang Zhao, He Wang, Minhyuk Sung, Leonidas Guibas

We introduce a novel 3D object proposal approach named Generative Shape Proposal Network (GSPN) for instance segmentation in point cloud data.

Ranked #27 on 3D Object Detection on ScanNetV2

3D Instance Segmentation 3D Object Detection +4

Paper
Code

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

no code implementations • CVPR 2019 • Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chi-Hao Wu, Jue Wang, Stefano Soatto

Surface-based geodesic topology provides strong cues for object semantic analysis and geometric modeling.

General Classification

Paper
Add Code

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss

no code implementations • CVPR 2020 • Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas Guibas, Hao Zhang

While the part prior network can be trained with noisy and inconsistently segmented shapes, the final output of AdaCoSeg is a consistent part labeling for the input set, with each shape segmented into up to (a user-specified) K parts.

Instance Segmentation Segmentation +1

Paper
Add Code

StructureNet: Hierarchical Graph Networks for 3D Shape Generation

2 code implementations • 1 Aug 2019 • Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

We introduce StructureNet, a hierarchical graph network which (i) can directly encode shapes represented as such n-ary graphs; (ii) can be robustly trained on large and complex shape families; and (iii) can be used to generate a great diversity of realistic structured shape geometries.

3D Shape Generation

253

Paper
Code

StructEdit: Learning Structural Shape Variations

1 code implementation • CVPR 2020 • Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

Learning to encode differences in the geometry and (topological) structure of the shapes of ordinary objects is key to generating semantically plausible variations of a given shape, transferring edits from one shape to another, and many other applications in 3D content creation.

Paper
Code

Category-Level Articulated Object Pose Estimation

2 code implementations • CVPR 2020 • Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space.

Object Pose Estimation

109

Paper
Code

SAPIEN: A SimulAted Part-based Interactive ENvironment

1 code implementation • CVPR 2020 • Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, Hao Su

To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable.

Attribute

315

Paper
Code

Curriculum DeepSDF

1 code implementation • ECCV 2020 • Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas

When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions.

3D Shape Representation Representation Learning

Paper
Code

Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks

no code implementations • 12 Jun 2020 • He Wang, Zetian Jiang, Li Yi, Kaichun Mo, Hao Su, Leonidas J. Guibas

We further study how different evaluation metrics weigh the sampling pattern against the geometry and propose several perceptual metrics forming a sampling spectrum of metrics.

Clustering valid

Paper
Add Code

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

no code implementations • CVPR 2021 • Li Yi, Boqing Gong, Thomas Funkhouser

We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Add Code

Compositionally Generalizable 3D Structure Prediction

1 code implementation • 4 Dec 2020 • Songfang Han, Jiayuan Gu, Kaichun Mo, Li Yi, Siyu Hu, Xuejin Chen, Hao Su

However, there remains a much more difficult and under-explored issue on how to generalize the learned skills over unseen object categories that have very different shape geometry distributions.

3D Shape Reconstruction Object +1

Paper
Code

Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments

1 code implementation • CVPR 2021 • Siyan Dong, Qingnan Fan, He Wang, Ji Shi, Li Yi, Thomas Funkhouser, Baoquan Chen, Leonidas Guibas

Localizing the camera in a known indoor environment is a key building block for scene mapping, robot navigation, AR, etc.

Camera Relocalization Robot Navigation +1

Paper
Code

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

no code implementations • 24 Dec 2020 • Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao Dong

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks.

Contrastive Learning Representation Learning +1

Paper
Add Code

Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds

no code implementations • NeurIPS 2021 • Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.

Object Pose Estimation +1

Paper
Add Code

Contrastive Multimodal Fusion with TupleInfoNCE

1 code implementation • ICCV 2021 • Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi

Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences.

Ranked #70 on Semantic Segmentation on NYU Depth v2

Contrastive Learning Representation Learning +1

Paper
Code

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

no code implementations • NeurIPS 2021 • Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.

Object Pose Estimation +1

Paper
Add Code

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations • NeurIPS 2021 • Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Object +2

Paper
Add Code

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

1 code implementation • CVPR 2022 • Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi

We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction.

Action Segmentation Benchmarking +6

Paper
Code

On Learning Contrastive Representations for Learning with Noisy Labels

1 code implementation • CVPR 2022 • Li Yi, Sheng Liu, Qi She, A. Ian McLeod, Boyu Wang

To address this issue, we focus on learning robust contrastive representations of data on which the classifier is hard to memorize the label noise under the CE loss.

Learning with noisy labels Memorization +1

Paper
Code

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

1 code implementation • CVPR 2022 • Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi

To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.

3D Part Segmentation Domain Generalization +1

Paper
Code

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

no code implementations • CVPR 2022 • Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang

We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers.

3D Semantic Segmentation

Paper
Add Code

Multi-Robot Active Mapping via Neural Bipartite Graph Matching

no code implementations • CVPR 2022 • Kai Ye, Siyan Dong, Qingnan Fan, He Wang, Li Yi, Fei Xia, Jue Wang, Baoquan Chen

Previous approaches either choose the frontier as the goal position via a myopic solution that hinders the time efficiency, or maximize the long-term value via reinforcement learning to directly regress the goal position, but does not guarantee the complete map construction.

Graph Matching Position +2

Paper
Add Code

Rotationally Equivariant 3D Object Detection

no code implementations • CVPR 2022 • Hong-Xing Yu, Jiajun Wu, Li Yi

To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

no code implementations • CVPR 2022 • Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.

Paper
Add Code

APES: Articulated Part Extraction from Sprite Sheets

1 code implementation • CVPR 2022 • Zhan Xu, Matthew Fisher, Yang Zhou, Deepali Aneja, Rushikesh Dudhat, Li Yi, Evangelos Kalogerakis

Rigged puppets are one of the most prevalent representations to create 2D character animations.

Paper
Code

BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios

no code implementations • 21 Jun 2022 • Jiafei Duan, Samson Yu, Nicholas Tan, Li Yi, Cheston Tan

Humans with an average level of social cognition can infer the beliefs of others based solely on the nonverbal communication signals (e. g. gaze, gesture, pose and contextual information) exhibited during social interactions.

Object

Paper
Add Code

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

1 code implementation • 30 Jul 2022 • Hao Wen, Yunze Liu, Jingwei Huang, Bo Duan, Li Yi

This paper proposes a 4D backbone for long-term point cloud video understanding.

point cloud video understanding Video Understanding

Paper
Code

AccoMontage2: A Complete Harmonization and Accompaniment Arrangement System

1 code implementation • 1 Sep 2022 • Li Yi, Haochen Hu, Jingwei Zhao, Gus Xia

We propose AccoMontage2, a system capable of doing full-length song harmonization and accompaniment arrangement based on a lead melody.

Retrieval Template Matching

105

Paper
Code

Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

no code implementations • 24 Sep 2022 • Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, He Wang

We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion.

hand-object pose Object +2

Paper
Add Code

Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing

1 code implementation • 8 Oct 2022 • Yun Liu, Xiaomeng Xu, Weihang Chen, Haocheng Yuan, He Wang, Jing Xu, Rui Chen, Li Yi

When manipulating an object to accomplish complex tasks, humans rely on both vision and touch to keep track of the object's 6D pose.

hand-object pose Object +1

Paper
Code

Morig: Motion-aware rigging of character meshes from point clouds

1 code implementation • 17 Oct 2022 • Zhan Xu, Yang Zhou, Li Yi, Evangelos Kalogerakis

We present MoRig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters.

Paper
Code

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

1 code implementation • CVPR 2023 • Haoran Geng, Helin Xu, Chengyang Zhao, Chao Xu, Li Yi, Siyuan Huang, He Wang

Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation.

3D Instance Segmentation Domain Generalization +3

Paper
Code

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

1 code implementation • 25 Nov 2022 • Junbo Zhang, Guofan Fan, Guanghan Wang, Zhengyuan Su, Kaisheng Ma, Li Yi

To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions.

Descriptive Instance Segmentation +5

Paper
Code

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

no code implementations • CVPR 2023 • Zhuoyang Zhang, Yuhao Dong, Yunze Liu, Li Yi

Recent work on 4D point cloud sequences has attracted a lot of attention.

Knowledge Distillation Representation Learning

Paper
Add Code

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

3 code implementations • 16 Dec 2022 • Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.

Ranked #5 on Few-Shot 3D Point Cloud Classification on ModelNet40 10-way (10-shot) (using extra training data)

Few-Shot 3D Point Cloud Classification Knowledge Distillation +1

108

Paper
Code

LeaF: Learning Frames for 4D Point Cloud Sequence Understanding

no code implementations • ICCV 2023 • Yunze Liu, Junyu Chen, Zekai Zhang, Jingwei Huang, Li Yi

With such frames, we can factorize geometry and motion to facilitate a feature-space geometric reconstruction for more effective 4D learning.

Descriptive

Paper
Add Code

When Source-Free Domain Adaptation Meets Learning with Noisy Labels

no code implementations • 31 Jan 2023 • Li Yi, Gezheng Xu, Pengcheng Xu, Jiaqi Li, Ruizhi Pu, Charles Ling, A. Ian McLeod, Boyu Wang

We also prove that such a difference makes existing LLN methods that rely on their distribution assumptions unable to address the label noise in SFDA.

Learning with noisy labels Source-Free Domain Adaptation

Paper
Add Code

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

Ranked #1 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet10 (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification +2

108

Paper
Code

Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance

1 code implementation • 28 Feb 2023 • Xueyi Liu, Ji Zhang, Ruizhen Hu, Haibin Huang, He Wang, Li Yi

Category-level articulated object pose estimation aims to estimate a hierarchy of articulation-aware object poses of an unseen articulated object from a known category.

Disentanglement Object +1

Paper
Code

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

1 code implementation • CVPR 2023 • Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, He Wang

Trained on our synthesized large-scale dexterous grasp dataset, this model enables us to sample diverse and high-quality dexterous grasp poses for the object point cloud. For the second stage, we propose to replace the motion planning used in parallel gripper grasping with a goal-conditioned grasp policy, due to the complexity involved in dexterous grasping execution.

Motion Planning

Paper
Code

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

1 code implementation • CVPR 2023 • Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang

Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

Ranked #1 on Pedestrian Attribute Recognition on PA-100K (using extra training data)

Attribute Autonomous Driving +5

204

Paper
Code

Controllable Ancient Chinese Lyrics Generation Based on Phrase Prototype Retrieving

no code implementations • 20 Mar 2023 • Li Yi

The phrase connector then selects a series of phrases from the phrase pool that minimizes a multi-term loss function that considers rhyme, song structure, and fluency.

Paper
Add Code

CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis

no code implementations • CVPR 2023 • Juntian Zheng, Qingyuan Zheng, Lixing Fang, Yun Liu, Li Yi

In this work, we focus on a novel task of category-level functional hand-object manipulation synthesis covering both rigid and articulated object categories.

Object

Paper
Add Code

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

1 code implementation • CVPR 2023 • Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han

High-resolution images enable neural networks to learn richer visual representations.

2D Semantic Segmentation Instance Segmentation +4

Paper
Code

Semi-Weakly Supervised Object Kinematic Motion Prediction

no code implementations • CVPR 2023 • Gengxin Liu, Qian Sun, Haibin Huang, Chongyang Ma, Yulan Guo, Li Yi, Hui Huang, Ruizhen Hu

First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale.

motion prediction Object +3

Paper
Add Code

JacobiNeRF: NeRF Shaping with Mutual Information Gradients

1 code implementation • CVPR 2023 • Xiaomeng Xu, Yanchao Yang, Kaichun Mo, Boxiao Pan, Li Yi, Leonidas Guibas

We propose a method that trains a neural radiance field (NeRF) to encode not only the appearance of the scene but also semantic correlations between scene points, regions, or entities -- aiming to capture their mutual co-variation patterns.

Instance Segmentation Semantic Segmentation

Paper
Code

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

no code implementations • ICCV 2023 • Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++.

Object

Paper
Add Code

Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation

1 code implementation • ICCV 2023 • Xueyi Liu, Bin Wang, He Wang, Li Yi

By observing an articulated object dataset containing only a few examples, we wish to learn a model that can generate diverse meshes with high visual fidelity and physical validity.

Philosophy

Paper
Code

3D Implicit Transporter for Temporally Consistent Keypoint Discovery

1 code implementation • ICCV 2023 • Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang, Jian Zhao

To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information.

Paper
Code

TransTouch: Learning Transparent Objects Depth Sensing Through Sparse Touches

no code implementations • 18 Sep 2023 • Liuyu Bian, Pengyang Shi, Weihang Chen, Jing Xu, Li Yi, Rui Chen

By approximating and optimizing the utility function, we can optimize the probing locations given a fixed touching budget to better improve the network's performance on real objects.

Transparent objects

Paper
Add Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

300

Paper
Code

NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding

no code implementations • 12 Oct 2023 • Yuhao Dong, Zhuoyang Zhang, Yunze Liu, Li Yi

We integrate NSM4D with state-of-the-art 4D perception backbones, demonstrating significant improvements on various online perception benchmarks in indoor and outdoor settings.

Action Segmentation Autonomous Driving +1

Paper
Add Code

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

no code implementations • 29 Nov 2023 • Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, Yang Gao

In this study, we are interested in imbuing robots with the capability of physically-grounded task planning.

Paper
Add Code

Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence

no code implementations • 13 Dec 2023 • Zifan Wang, Zhuorui Ye, Haoran Wu, Junyu Chen, Li Yi

To tackle this challenging problem, we properly model the synergetic relationship between future forecasting and semantic scene completion through a novel network named SCSFNet.

Paper
Add Code

Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting

1 code implementation • 14 Dec 2023 • Yunze Liu, Changxi Chen, Li Yi

To support this task, we construct two datasets named HHI and CoChair and propose a unified method.

228

Paper
Code

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation

no code implementations • 1 Jan 2024 • Zifan Wang, Junyu Chen, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi

We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning.

Grasp Generation Imitation Learning

Paper
Add Code

TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding

no code implementations • 16 Jan 2024 • Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi

Humans commonly work with multiple objects in daily life and can intuitively transfer manipulation skills to novel objects by understanding object functional regularities.

Action Recognition Benchmarking +2

Paper
Add Code

CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding

no code implementations • 17 Jan 2024 • Yunze Liu, Changxi Chen, Zifan Wang, Li Yi

This paper introduces a novel approach named CrossVideo, which aims to enhance self-supervised cross-modal contrastive learning in the field of point cloud video understanding.

Contrastive Learning point cloud video understanding +2

Paper
Add Code

Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

1 code implementation • 22 Jan 2024 • Feiyu Yao, Zongkai Wu, Li Yi

In this paper, we use well-designed Body Pose Graph (BPG) to represent the human body and translate the challenge into a prediction problem of graph missing nodes.

Paper
Code

LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

no code implementations • 19 Feb 2024 • Keyang Xuan, Li Yi, Fan Yang, Ruochen Wu, Yi R. Fung, Heng Ji

In this paper, we first investigate the potential of LVLM on multimodal misinformation detection.

Language Modelling LEMMA +1

Paper
Add Code

GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

1 code implementation • 22 Feb 2024 • Xueyi Liu, Li Yi

We tackle those challenges through a novel approach, GeneOH Diffusion, incorporating two key designs: an innovative contact-centric HOI representation named GeneOH and a new domain-generalizable denoising scheme.

Denoising

Paper
Code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Ranked #1 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Point Cloud Linear Classification 3D Question Answering (3D-QA) +8

108

Paper
Code

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation

no code implementations • 1 Apr 2024 • Yunze Liu, Changxi Chen, Chenjing Ding, Li Yi

Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate.

Paper
Add Code

GenN2N: Generative NeRF2NeRF Translation

no code implementations • 3 Apr 2024 • Xiangyue Liu, Han Xue, Kunming Luo, Ping Tan, Li Yi

We present GenN2N, a unified NeRF-to-NeRF translation framework for various NeRF translation tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc.

Colorization Contrastive Learning +2

Paper
Add Code

QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer

1 code implementation • 11 Apr 2024 • Xueyi Liu, Kangbo Lyu, Jieqiong Zhang, Tao Du, Li Yi

We explore the dexterous manipulation transfer problem by designing simulators.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.