no code implementations • 27 Jan 2025 • Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Yan Ding, Zhigang Wang, Jiayuan Gu, Bin Zhao, Dong Wang, Xuelong Li
Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control.
1 code implementation • 24 Oct 2024 • Hansheng Chen, Bokui Shen, Yulin Liu, Ruoxi Shi, Linqi Zhou, Connor Z. Lin, Jiayuan Gu, Hao Su, Gordon Wetzstein, Leonidas Guibas
Multi-view image diffusion models have significantly advanced open-domain 3D object generation.
1 code implementation • 25 Jun 2024 • Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su
The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM).
1 code implementation • 9 May 2024 • Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao
We then employ these approaches to create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
1 code implementation • 18 Mar 2024 • Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas
Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity.
1 code implementation • 5 Dec 2023 • Yuchen Zhou, Jiayuan Gu, Xuanlin Li, Minghua Liu, Yunhao Fang, Hao Su
Open-world 3D part segmentation is pivotal in diverse applications such as robotics and AR/VR.
no code implementations • CVPR 2024 • Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts.
no code implementations • 3 Nov 2023 • Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan Vuong, Ted Xiao
Generalization remains one of the most important desiderata for robust robot learning systems.
1 code implementation • 9 Feb 2023 • Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, Xiaodi Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, Hao Su
Generalizable manipulation skills, which can be composed to tackle long-horizon and complex daily chores, are one of the cornerstones of Embodied AI.
1 code implementation • ACL 2022 • Aliva Das, Xinya Du, Barry Wang, Kejian Shi, Jiayuan Gu, Thomas Porter, Claire Cardie
Document-level information extraction (IE) tasks have recently begun to be revisited in earnest using the end-to-end neural network techniques that have been successful on their sentence-level IE counterparts.
1 code implementation • 6 Sep 2022 • Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik
To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks.
no code implementations • ECCV 2020 • Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun
Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.
1 code implementation • 4 Dec 2020 • Songfang Han, Jiayuan Gu, Kaichun Mo, Li Yi, Siyu Hu, Xuejin Chen, Hao Su
However, there remains a much more difficult and under-explored issue on how to generalize the learned skills over unseen object categories that have very different shape geometry distributions.
no code implementations • NeurIPS 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su
Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.
no code implementations • NeurIPS 2020 • Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su
We study how to learn a policy with compositional generalizability.
1 code implementation • 26 Oct 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su
Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.
no code implementations • ECCV 2020 • Jiayuan Gu, Wei-Chiu Ma, Sivabalan Manivasagam, Wenyuan Zeng, ZiHao Wang, Yuwen Xiong, Hao Su, Raquel Urtasun
3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned.
no code implementations • 30 Sep 2019 • Maximilian Jaritz, Jiayuan Gu, Hao Su
Fusion of 2D images and 3D point clouds is important because information from dense images can enhance sparse point clouds.
1 code implementation • NeurIPS 2018 • Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, John Hopcroft
The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively.
no code implementations • ECCV 2018 • Jiayuan Gu, Han Hu, Li-Wei Wang, Yichen Wei, Jifeng Dai
While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely hand-crafted, featured by RoI pooling methods.
6 code implementations • CVPR 2018 • Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era.