Search Results for author: James M. Rehg

Found 71 papers, 26 papers with code

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

no code implementations4 Mar 2024 Sangmin Lee, Bolin Lai, Fiona Ryan, Bikram Boote, James M. Rehg

Furthermore, we propose a novel multimodal baseline that leverages densely aligned language-visual representations by synchronizing visual features with their corresponding utterances.

coreference-resolution

ZeroShape: Regression-based Zero-shot Shape Reconstruction

no code implementations21 Dec 2023 Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg

In contrast, the traditional approach to this problem is regression-based, where deterministic models are trained to directly regress the object shape.

3D Shape Reconstruction Computational Efficiency +1

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

no code implementations20 Dec 2023 Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

We propose a unified multi-modal, multi-task framework -- Audio-Visual Conversational Attention (Av-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video.

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

no code implementations7 Dec 2023 Yunsheng Ma, Can Cui, Xu Cao, Wenqian Ye, Peiran Liu, Juanwu Lu, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Aniket Bera, James M. Rehg, Ziran Wang

We present LaMPilot, a novel framework for planning in the field of autonomous driving, rethinking the task as a code-generation process that leverages established behavioral primitives.

Autonomous Driving Code Generation +1

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

no code implementations6 Dec 2023 Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

Additionally, existing diffusion-based image manipulation models are sub-optimal in controlling the state transition of an action in egocentric image pixel space because of the domain gap.

Image Manipulation Language Modelling +1

Low-shot Object Learning with Mutual Exclusivity Bias

1 code implementation NeurIPS 2023 Anh Thai, Ahmad Humayun, Stefan Stojanov, Zixuan Huang, Bikram Boote, James M. Rehg

This paper introduces Low-shot Object Learning with Mutual Exclusivity Bias (LSME), the first computational framing of mutual exclusivity bias, a phenomenon commonly observed in infants during word learning.

Object

Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model

no code implementations30 Nov 2023 Meera Hahn, Amit Raj, James M. Rehg

The challenging task of Vision-and-Language Navigation (VLN) requires embodied agents to follow natural language instructions to reach a goal location or object (e. g. `walk down the hallway and turn left at the piano').

Vision and Language Navigation

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations30 Nov 2023 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

REBAR: Retrieval-Based Reconstruction for Time-series Contrastive Learning

1 code implementation1 Nov 2023 Maxwell A. Xu, Alexander Moreno, Hui Wei, Benjamin M. Marlin, James M. Rehg

The success of self-supervised contrastive learning hinges on identifying positive data pairs, such that when they are pushed together in embedding space, the space encodes useful information for subsequent downstream tasks.

Contrastive Learning Retrieval +1

Explaining a machine learning decision to physicians via counterfactuals

1 code implementation10 Jun 2023 Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, Alexei Wagner

Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system.

Time Series

Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

no code implementations6 May 2023 Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu, James M. Rehg

Motivated by this observation, we introduce the first model that leverages both the video and audio modalities for egocentric gaze anticipation.

Representation Learning

Egocentric Auditory Attention Localization in Conversations

no code implementations CVPR 2023 Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others.

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

no code implementations16 Dec 2022 Bolin Lai, Hongxin Zhang, Miao Liu, Aryan Pariani, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang

We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes.

Persuasion Strategies

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation

1 code implementation14 Dec 2022 Maxwell A. Xu, Alexander Moreno, Supriya Nagesh, V. Burak Aydemir, David W. Wetter, Santosh Kumar, James M. Rehg

The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions.

Imputation

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

1 code implementation28 Nov 2022 Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg

A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes.

Novel View Synthesis Object +4

Transformer-based Localization from Embodied Dialog with Large-scale Pre-training

no code implementations10 Oct 2022 Meera Hahn, James M. Rehg

We address the challenging task of Localization via Embodied Dialog (LED).

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation

no code implementations8 Aug 2022 Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg

To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel Global-Local Correlation (GLC) module to explicitly model the correlation of the global token and each local token.

Gaze Estimation

Planes vs. Chairs: Category-guided 3D shape learning without any 3D cues

no code implementations21 Apr 2022 Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg

We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image.

3D Shape Reconstruction 3D Shape Representation +1

Kernel Deformed Exponential Families for Sparse Continuous Attention

no code implementations1 Nov 2021 Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M. Rehg

Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families.

Transformers for prompt-level EMA non-response prediction

no code implementations1 Nov 2021 Supriya Nagesh, Alexander Moreno, Stephanie M. Carpenter, Jamie Yap, Soujanya Chatterjee, Steven Lloyd Lizotte, Neng Wan, Santosh Kumar, Cho Lam, David W. Wetter, Inbal Nahum-Shani, James M. Rehg

The transformer model achieves a non-response prediction AUC of 0. 77 and is significantly better than classical ML and LSTM-based deep learning models.

No RL, No Simulation: Learning to Navigate without Navigating

1 code implementation NeurIPS 2021 Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta

Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards.

Navigate Reinforcement Learning (RL)

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Egocentric Activity Recognition and Localization on a 3D Map

no code implementations20 May 2021 Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li

Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space?

Action Localization Action Recognition +2

The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction

3 code implementations18 Jan 2021 Anh Thai, Stefan Stojanov, Zixuan Huang, Isaac Rehg, James M. Rehg

Continual learning has been extensively studied for classification tasks with methods developed to primarily avoid catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples.

3D Shape Reconstruction Continual Learning +2

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

no code implementations26 Nov 2020 Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M. Rehg, Siyu Tang

We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos.

Time Series Time Series Analysis

Where Are You? Localization from Embodied Dialog

2 code implementations EMNLP 2020 Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Navigate Visual Dialog

3D Reconstruction of Novel Object Shapes from Single Images

2 code implementations14 Jun 2020 Anh Thai, Stefan Stojanov, Vijay Upadhya, James M. Rehg

This is challenging as it requires a model to learn a representation that can infer both the visible and occluded portions of any object using a limited training set.

3D Reconstruction 3D Shape Reconstruction +1

In the Eye of the Beholder: Gaze and Actions in First Person Video

no code implementations31 May 2020 Yin Li, Miao Liu, James M. Rehg

Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV.

Action Recognition Gaze Estimation

Neural Similarity Learning

1 code implementation NeurIPS 2019 Weiyang Liu, Zhen Liu, James M. Rehg, Le Song

By generalizing inner product with a bilinear matrix, we propose the neural similarity which serves as a learnable parametric similarity measure for CNNs.

Few-Shot Learning

Regularizing Neural Networks via Minimizing Hyperspherical Energy

1 code implementation CVPR 2020 Rongmei Lin, Weiyang Liu, Zhen Liu, Chen Feng, Zhiding Yu, James M. Rehg, Li Xiong, Le Song

Inspired by the Thomson problem in physics where the distribution of multiple propelling electrons on a unit sphere can be modeled via minimizing some potential energy, hyperspherical energy minimization has demonstrated its potential in regularizing neural networks and improving their generalization power.

Locally Weighted Regression Pseudo-Rehearsal for Online Learning of Vehicle Dynamics

no code implementations13 May 2019 Grady Williams, Brian Goldfain, James M. Rehg, Evangelos A. Theodorou

We consider the problem of online adaptation of a neural network designed to represent vehicle dynamics.

regression

Tripping through time: Efficient Localization of Activities in Videos

no code implementations22 Apr 2019 Meera Hahn, Asim Kadav, James M. Rehg, Hans Peter Graf

Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video.

Learning to Generate Synthetic Data via Compositing

no code implementations CVPR 2019 Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari

The synthesizer and target networks are trained in an adversarial manner wherein each network is updated with a goal to outdo the other.

Data Augmentation Human Detection +3

Attention Distillation for Learning Video Representations

no code implementations5 Apr 2019 Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg

To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition.

Action Recognition Video Recognition

Action2Vec: A Crossmodal Embedding Approach to Action Learning

no code implementations2 Jan 2019 Meera Hahn, Andrew Silva, James M. Rehg

We describe a novel cross-modal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatio-temporal features derived from video clips.

Action Recognition General Classification +2

Taking a Deeper Look at the Inverse Compositional Algorithm

1 code implementation CVPR 2019 Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger

In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment.

Motion Estimation regression

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

no code implementations22 Sep 2018 Meera Hahn, Nataniel Ruiz, Jean-Baptiste Alayrac, Ivan Laptev, James M. Rehg

Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision.

Object Object Recognition

Multi-object Tracking with Neural Gating Using Bilinear LSTM

no code implementations ECCV 2018 Chanho Kim, Fuxin Li, James M. Rehg

We also propose novel data augmentation approaches to efficiently train recurrent models that score object tracks on both appearance and motion.

Data Augmentation Multi-Object Tracking +3

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

no code implementations ECCV 2018 Yin Li, Miao Liu, James M. Rehg

We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera.

Action Recognition Gaze Estimation +1

3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare

no code implementations CVPR 2018 Abhijit Kundu, Yin Li, James M. Rehg

Our method produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving.

Ranked #3 on Vehicle Pose Estimation on KITTI Cars Hard (using extra training data)

3D Object Reconstruction Autonomous Driving +2

Decoupled Networks

1 code implementation CVPR 2018 Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song

Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations.

Towards Black-box Iterative Machine Teaching

no code implementations ICML 2018 Weiyang Liu, Bo Dai, Xingguo Li, Zhen Liu, James M. Rehg, Le Song

We propose an active teacher model that can actively query the learner (i. e., make the learner take exams) for estimating the learner's status and provably guide the learner to achieve faster convergence.

Fine-Grained Head Pose Estimation Without Keypoints

13 code implementations2 Oct 2017 Nataniel Ruiz, Eunji Chong, James M. Rehg

Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment.

Face Alignment Gaze Estimation +1

Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

1 code implementation15 Aug 2017 Nataniel Ruiz, James M. Rehg

Face detection is a very important task and a necessary pre-processing step for many applications such as facial landmark detection, pose estimation, sentiment analysis and face recognition.

Face Detection Face Recognition +3

iSurvive: An Interpretable, Event-time Prediction Model for mHealth

no code implementations ICML 2017 Walter H. Dempsey, Alexander Moreno, Christy K. Scott, Michael L. Dennis, David H. Gustafson, Susan A. Murphy, James M. Rehg

We present a parameter learning method for GLM emissions and survival model fitting, and present promising results on both synthetic data and an mHealth drug use dataset.

Survival Analysis

Information Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving

2 code implementations7 Jul 2017 Grady Williams, Paul Drews, Brian Goldfain, James M. Rehg, Evangelos A. Theodorou

We present an information theoretic approach to stochastic optimal control problems that can be used to derive general sampling based optimization schemes.

Robotics

Iterative Machine Teaching

2 code implementations ICML 2017 Weiyang Liu, Bo Dai, Ahmad Humayun, Charlene Tay, Chen Yu, Linda B. Smith, James M. Rehg, Le Song

Different from traditional machine teaching which views the learners as batch algorithms, we study a new paradigm where the learner uses an iterative algorithm and a teacher can feed examples sequentially and intelligently based on the current performance of the learner.

Automatic Variational ABC

no code implementations28 Jun 2016 Alexander Moreno, Tameem Adel, Edward Meeds, James M. Rehg, Max Welling

Approximate Bayesian Computation (ABC) is a framework for performing likelihood-free posterior inference for simulation models.

Variational Inference

Multiple Hypothesis Tracking Revisited

no code implementations ICCV 2015 Chanho Kim, Fuxin Li, Arridhana Ciptadi, James M. Rehg

This paper revisits the classical multiple hypotheses tracking (MHT) algorithm in a tracking-by-detection framework.

Minimizing Human Effort in Interactive Tracking by Incremental Learning of Model Parameters

no code implementations ICCV 2015 Arridhana Ciptadi, James M. Rehg

We address the problem of minimizing human effort in interactive tracking by learning sequence-specific model parameters.

Incremental Learning

The Middle Child Problem: Revisiting Parametric Min-Cut and Seeds for Object Proposals

no code implementations ICCV 2015 Ahmad Humayun, Fuxin Li, James M. Rehg

We propose a new energy minimization framework incorporating geodesic distances between segments which solves this problem.

Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression

no code implementations NeurIPS 2015 Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, James M. Rehg

The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time.

Unsupervised Learning of Edges

no code implementations CVPR 2016 Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár

In this work we present a simple yet effective approach for training edge detectors without human supervision.

Edge Detection Motion Estimation +2

The Secrets of Salient Object Segmentation

1 code implementation CVPR 2014 Yin Li, Xiaodi Hou, Christof Koch, James M. Rehg, Alan L. Yuille

The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing.

Object Segmentation +1

RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions

no code implementations CVPR 2014 Ahmad Humayun, Fuxin Li, James M. Rehg

By precomputing a graph which can be used for parametric min-cuts over different seeds, we speed up the generation of the segment pool.

Object Recognition Segmentation

Modeling Actions through State Changes

no code implementations CVPR 2013 Alireza Fathi, James M. Rehg

The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.