This paper introduces Activity Parsing as the overarching task of temporal segmentation and classification of activities, sub-activities, atomic actions, along with an instance-level understanding of actors, objects, and their relationships in videos.
The proposed method's performance by considering the improvements and adaptations required for the storm surge data is assessed and compared to the original GAIN and a few other techniques.
1 code implementation • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution.
However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.
Ranked #1 on Video Classification on Home Action Genome
Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods.
Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems.
Longitudinal MRIs are often used to capture the gradual deterioration of brain structure and function caused by aging or neurological diseases.
To address this issue, we propose a margin loss that regularizes the similarity in relationships of the representations across subjects and modalities.
Interpretability is a critical factor in applying complex deep learning models to advance the understanding of brain disorders in neuroimaging studies.
Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning.
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips.
This is the first benchmark for classifying PD patients based on MDS-UPDRS gait severity and could be an objective biomarker for disease severity.
In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline.
Machine learning analysis of longitudinal neuroimaging data is typically based on supervised learning, which requires a large number of ground-truth labels to be informative.
Therefore, the proposed network has a dual-branch architecture that tackles two tasks: 1) a segmentation sub-network aiming to generate the prostate segmentation, and 2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss.
In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.
Ranked #1 on Multi-future Trajectory Prediction on ETH/UCY
Many neurological diseases are characterized by gradual deterioration of brain structure and function.
In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time.
The Blood-Oxygen-Level-Dependent (BOLD) signal of resting-state fMRI (rs-fMRI) records the temporal dynamics of intrinsic functional networks in the brain.
In addition, we introduce a new dataset designed specifically for autonomous-driving scenarios in areas with dense pedestrian populations: the Stanford-TRI Intent Prediction (STIP) dataset.
Action recognition has been a widely studied topic with a heavy focus on supervised learning involving sufficient labeled videos.
In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify the two problems and address the practically useful task of pedestrian locomotion prediction in the wild.
Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years.
We apply our method to a synthetic, a medical diagnosis, and a gender classification (Gender Shades) dataset.
Modeling and prediction of human motion dynamics has long been a challenging problem in computer vision, and most existing methods rely on the end-to-end supervised training of various architectures of recurrent neural networks.
Ranked #3 on Human Pose Forecasting on Human3.6M
Conventional unsupervised learning methods only focused on training deep networks to understand the primitive characteristics of the visual data, mainly to be able to reconstruct the data from a latent space.
With recent advances in deep learning, neuroimaging studies increasingly rely on convolutional networks (ConvNets) to predict diagnosis based on MR images.
In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking.
Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart.
While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist.
While unsupervised variational autoencoders (VAE) have become a powerful tool in neuroimage analysis, their application to supervised learning is under-explored.
In this paper we propose a novel generative process, in which we use a Gaussian-mixture to model a few major clusters in the data, and use a non-informative uniform distribution to capture the remaining data.
Various applications in different fields, such as gene expression analysis or computer vision, suffer from data sets with high-dimensional low-sample-size (HDLSS), which has posed significant challenges for standard statistical and modern machine learning methods.
In this paper, we propose a new action-agnostic method for short- and long-term human pose forecasting.
Ranked #4 on Human Pose Forecasting on Human3.6M
As shown in computer vision, the power of deep learning lies in automatically learning relevant and powerful features for any perdition task, which is made possible through end-to-end architectures.
In this work, we use a deep learning framework for simultaneous classification and regression of Parkinson disease diagnosis based on MR-Images and personal information (i. e. age, gender).
Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc.
Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples.
SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations.
Ranked #90 on Image Classification on CIFAR-100