1 code implementation • 24 Oct 2024 • Lawrence Jang, Yinheng Li, Charles Ding, Justin Lin, Paul Pu Liang, Dan Zhao, Rogerio Bonatti, Kazuhito Koishida
Videos are often used to learn or extract the necessary information to complete tasks in ways different than what text and static imagery alone can provide.
1 code implementation • 12 Sep 2024 • Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui
To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.
no code implementations • 27 Jun 2024 • Yinheng Li, Rogerio Bonatti, Sara Abdali, Justin Wagle, Kazuhito Koishida
Using Large Language Models (LLMs) to generate synthetic data for model training has become increasingly popular in recent years.
1 code implementation • 3 Oct 2023 • Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta
We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera.
1 code implementation • ICCV 2023 • Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma
We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning.
no code implementations • 7 Mar 2023 • Yue Meng, Sai Vemprala, Rogerio Bonatti, Chuchu Fan, Ashish Kapoor
In this work, we propose Control Barrier Transformer (ConBaT), an approach that learns safe behaviors from demonstrations in a self-supervised fashion.
1 code implementation • 20 Feb 2023 • Sai Vemprala, Rogerio Bonatti, Arthur Bucker, Ashish Kapoor
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications.
no code implementations • 24 Jan 2023 • Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor
Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels.
no code implementations • 22 Sep 2022 • Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor
Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge.
2 code implementations • 4 Aug 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, Rogerio Bonatti
Natural language is one of the most intuitive ways to express human intent.
no code implementations • 25 Mar 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti
However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands.
no code implementations • 9 Aug 2021 • Cherie Ho, Andrew Jong, Harry Freeman, Rohan Rao, Rogerio Bonatti, Sebastian Scherer
Aerial vehicles are revolutionizing applications that require capturing the 3D structure of dynamic targets in the wild, such as sports, medicine, and entertainment.
no code implementations • 19 Nov 2020 • Rogerio Bonatti, Arthur Bucker, Sebastian Scherer, Mustafa Mukadam, Jessica Hodgins
First, we generate a database of video clips with a diverse range of shots in a photo-realistic simulator, and use hundreds of participants in a crowd-sourcing framework to obtain scores for a set of semantic descriptors for each clip.
no code implementations • 10 Nov 2020 • Arthur Bucker, Rogerio Bonatti, Sebastian Scherer
We validate our approach in multiple cluttered environments of a photo-realistic simulator, and deploy the system using two UAVs in real-world experiments.
no code implementations • 15 Oct 2019 • Rogerio Bonatti, Wenshan Wang, Cherie Ho, Aayush Ahuja, Mirko Gschwindt, Efe Camci, Erdal Kayacan, Sanjiban Choudhury, Sebastian Scherer
In this work, we address the problem in its entirety and propose a complete system for real-time aerial cinematography that for the first time combines: (1) vision-based target estimation; (2) 3D signed-distance mapping for occlusion estimation; (3) efficient trajectory optimization for long time-horizon camera motion; and (4) learning-based artistic shot selection.
2 code implementations • 16 Sep 2019 • Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian Scherer, Ashish Kapoor
We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors.
no code implementations • 8 Jul 2019 • Rogerio Bonatti, Arthur Gola de Paula
We study the automatic reply of email business messages in Brazilian Portuguese.
no code implementations • 4 Apr 2019 • Mirko Gschwindt, Efe Camci, Rogerio Bonatti, Wenshan Wang, Erdal Kayacan, Sebastian Scherer
Aerial filming is constantly gaining importance due to the recent advances in drone technology.
no code implementations • 4 Apr 2019 • Rogerio Bonatti, Cherie Ho, Wenshan Wang, Sanjiban Choudhury, Sebastian Scherer
In this work, we overcome such limitations and propose a complete system for aerial cinematography that combines: (1) a vision-based algorithm for target localization; (2) a real-time incremental 3D signed-distance map algorithm for occlusion and safety computation; and (3) a real-time camera motion planner that optimizes smoothness, collisions, occlusions and artistic guidelines.
no code implementations • 26 Mar 2019 • Wenshan Wang, Aayush Ahuja, Yanfu Zhang, Rogerio Bonatti, Sebastian Scherer
We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced.
1 code implementation • 16 Oct 2018 • Yanfu Zhang, Wenshan Wang, Rogerio Bonatti, Daniel Maturana, Sebastian Scherer
The first-stage network learns feature representations of the environment using low-level LiDAR statistics and the second-stage network combines those learned features with kinematics data.
no code implementations • 28 Aug 2018 • Rogerio Bonatti, yanfu Zhang, Sanjiban Choudhury, Wenshan Wang, Sebastian Scherer
Autonomous aerial cinematography has the potential to enable automatic capture of aesthetically pleasing videos without requiring human intervention, empowering individuals with the capability of high-end film studios.