no code implementations • 5 May 2025 • Lu Ling, Chen-Hsuan Lin, Tsung-Yi Lin, Yifan Ding, Yu Zeng, Yichen Sheng, Yunhao Ge, Ming-Yu Liu, Aniket Bera, Zhaoshuo Li
Our key insight is that vision perception can bridge this gap by providing realistic spatial guidance that LLMs lack.
no code implementations • 22 Apr 2025 • Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui
We introduce the Describe Anything Model (DAM), a model designed for detailed localized captioning (DLC).
2 code implementations • 18 Mar 2025 • Nvidia, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Xinglong Sun, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge.
5 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups.
no code implementations • 23 Nov 2024 • Bingxin Xu, Yuzhang Shang, Yunhao Ge, Qian Lou, Yan Yan
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in visual-language tasks but face significant deployment challenges due to their high computational demands.
no code implementations • 11 Nov 2024 • Nvidia, :, Yuval Atzmon, Maciej Bala, Yogesh Balaji, Tiffany Cai, Yin Cui, Jiaojiao Fan, Yunhao Ge, Siddharth Gururani, Jacob Huffman, Ronald Isaac, Pooya Jannaty, Tero Karras, Grace Lam, J. P. Lewis, Aaron Licata, Yen-Chen Lin, Ming-Yu Liu, Qianli Ma, Arun Mallya, Ashlee Martino-Tarr, Doug Mendez, Seungjun Nah, Chris Pruett, Fitsum Reda, Jiaming Song, Ting-Chun Wang, Fangyin Wei, Xiaohui Zeng, Yu Zeng, Qinsheng Zhang
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy.
no code implementations • 11 Nov 2024 • Nvidia, :, Maciej Bala, Yin Cui, Yifan Ding, Yunhao Ge, Zekun Hao, Jon Hasselgren, Jacob Huffman, Jingyi Jin, J. P. Lewis, Zhaoshuo Li, Chen-Hsuan Lin, Yen-Chen Lin, Tsung-Yi Lin, Ming-Yu Liu, Alice Luo, Qianli Ma, Jacob Munkberg, Stella Shi, Fangyin Wei, Donglai Xiang, Jiashu Xu, Xiaohui Zeng, Qinsheng Zhang
We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation.
no code implementations • CVPR 2024 • Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K.
no code implementations • CVPR 2024 • Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui
VFC consists of three steps: 1) proposal, where image-to-text captioning models propose multiple initial captions; 2) verification, where a large language model (LLM) utilizes tools such as object detection and VQA models to fact-check proposed captions; 3) captioning, where an LLM generates the final caption by summarizing caption proposals and the fact check verification results.
no code implementations • 21 Dec 2023 • Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution.
1 code implementation • NeurIPS 2023 • Yunhao Ge, Hong-Xing Yu, Cheng Zhao, Yuliang Guo, Xinyu Huang, Liu Ren, Laurent Itti, Jiajun Wu
A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets.
no code implementations • 22 Nov 2023 • Kiran Lekkala, Eshan Bhargava, Yunhao Ge, Laurent Itti
We create a novel benchmark for evaluating a Deployable Lifelong Learning system for Visual Reinforcement Learning (RL) that is pretrained on a curated dataset, and propose a novel Scalable Lifelong Learning system capable of retaining knowledge from the previously learnt RL tasks.
1 code implementation • 12 Sep 2023 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
A foreground-background segmentation algorithm is then used to generate foreground object masks.
1 code implementation • ICCV 2023 • Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti
Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.
no code implementations • 15 Jun 2023 • Madeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet
In this work, we perform a robustness analysis of Visual Foundation Models (VFMs) for segmentation tasks and focus on robustness against real-world distribution shift inspired perturbations.
no code implementations • 26 May 2023 • Yunhao Ge, Jie Ren, Jiaping Zhao, KaiFeng Chen, Andrew Gallagher, Laurent Itti, Balaji Lakshminarayanan
Despite considerable effort, the problem remains significantly challenging in deep learning models due to their propensity to output over-confident predictions for OOD inputs.
1 code implementation • 24 May 2023 • Yunhao Ge, Yuecheng Li, Di wu, Ao Xu, Adam M. Jones, Amanda Sofie Rios, Iordanis Fostiropoulos, Shixian Wen, Po-Hsuan Huang, Zachary William Murdock, Gozde Sahin, Shuo Ni, Kiran Lekkala, Sumedh Anand Sontakke, Laurent Itti
We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel.
1 code implementation • 15 Dec 2022 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet
Finally, the third component creates a large-scale pseudo-labeled instance segmentation training dataset by compositing the foreground object masks onto the original and generated background images.
1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao
We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.
1 code implementation • 22 Jul 2022 • Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet
However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.
1 code implementation • 19 Jul 2022 • Yunhao Ge, Yao Xiao, Zhi Xu, Xingrui Wang, Laurent Itti
We use human experiments to confirm that both HVE and humans predominantly use some specific features to support the classification of specific classes (e. g., texture is the dominant feature to distinguish a zebra from other quadrupeds, both for humans and HVE).
no code implementations • 20 Jun 2022 • Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet
For foreground object mask generation, we use a simple textual template with object class name as input to DALL-E to generate a diverse set of foreground images.
no code implementations • 13 Jun 2022 • Yunhao Ge, Sercan Ö. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister
ISL splits the data into different environments, and learns a structure that is invariant to the target across different environments by imposing a consistency constraint.
no code implementations • 6 Dec 2021 • Yunhao Ge, Zhi Xu, Yao Xiao, Gan Xin, Yunkui Pang, Laurent Itti
(2) They lack convexity constraints, which is important for meaningfully manipulating specific attributes for downstream tasks.
no code implementations • 29 Sep 2021 • Yunhao Ge, Yao Xiao, Zhi Xu, Linwei Li, Ziyan Wu, Laurent Itti
Take image classification as an example, HNI visualizes the reasoning logic of a NN with class-specific Structural Concept Graphs (c-SCG), which are human-interpretable.
no code implementations • CVPR 2021 • Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, Ziyan Wu
Despite substantial progress in applying neural networks (NN) to a wide variety of areas, they still largely suffer from a lack of transparency and interpretability.
1 code implementation • ICLR Workshop Neural_Compression 2021 • Yunhao Ge, Yunkui Pang, Linwei Li, Laurent Itti
We consider the problem of graph data compression and representation.
no code implementations • 1 Jan 2021 • Yunhao Ge, Gan Xin, Zhi Xu, Yao Xiao, Yunkui Pang, Yining HE, Laurent Itti
DEAE can become a generative model and synthesis semantic controllable samples by interpolating latent code, which can even synthesis novel attribute value never is shown in the original dataset.
no code implementations • 27 Sep 2020 • Shixian Wen, Amanda Rios, Yunhao Ge, Laurent Itti
Continual learning of multiple tasks in artificial neural networks using gradient descent leads to catastrophic forgetting, whereby a previously learned mapping of an old task is erased when learning new mappings for new tasks.
1 code implementation • ICLR 2021 • Yunhao Ge, Sami Abu-El-Haija, Gan Xin, Laurent Itti
Visual cognition of primates is superior to that of artificial neural networks in its ability to 'envision' a visual object, even a newly-introduced one, in different attributes including pose, position, color, texture, etc.
1 code implementation • ECCV 2020 • Yunhao Ge, Jiaping Zhao, Laurent Itti
After training on unbalanced discrete poses (5 classes with 6 poses per object instance, plus 5 classes with only 2 poses), we show that OPT-Net can synthesize balanced continuous new poses along yaw and pitch axes with high quality.
no code implementations • 30 Jul 2019 • Dongming Wei, Sahar Ahmad, Jiayu Huo, Wen Peng, Yunhao Ge, Zhong Xue, Pew-Thian Yap, Wentao Li, Dinggang Shen, Qian Wang
Then, an unsupervised registration network is used to efficiently align the pre-procedural CT (pCT) with the inpainted iCT (inpCT) image.