Search Results for author: Lorenzo Torresani

Found 65 papers, 13 papers with code

What You Say Is What You Show: Visual Narration Detection in Instructional Videos

no code implementations5 Jan 2023 Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Narrated "how-to" videos have emerged as a promising data source for a wide range of learning problems, from learning visual representations to training robot policies.

HierVL: Learning Hierarchical Video-Language Embeddings

no code implementations5 Jan 2023 Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Video-language embeddings are a promising avenue for injecting semantics into visual representations, but existing methods capture only short-term associations between seconds-long video clips and their accompanying text.

Ego-Only: Egocentric Action Detection without Exocentric Pretraining

no code implementations3 Jan 2023 Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani

On EPIC-Kitchens-100, our Ego-Only even outperforms exocentric pretraining (by 2. 1% on verbs and by 1. 8% on nouns), setting a new state-of-the-art.

Action Detection Temporal Action Localization

Egocentric Video Task Translation

no code implementations13 Dec 2022 Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e. g., classifying sports in one dataset, tracking animals in another).

Multi-Task Learning Translation +1

HistoPerm: A Permutation-Based View Generation Approach for Learning Histopathologic Feature Representations

no code implementations13 Sep 2022 Joseph DiPalma, Lorenzo Torresani, Saeed Hassanpour

In this paper, we introduce HistoPerm, a view generation approach designed for improving the performance of representation learning techniques on histology images in weakly supervised settings.

Classification Representation Learning

Deformable Video Transformer

no code implementations CVPR 2022 Jue Wang, Lorenzo Torresani

Video transformers have recently emerged as an effective alternative to convolutional networks for action classification.

Action Classification

Calibrating Histopathology Image Classifiers using Label Smoothing

no code implementations28 Jan 2022 Jerry Wei, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Moreover, we find that using model confidence as a proxy for annotator agreement also improves calibration and accuracy, suggesting that datasets without multiple annotators can still benefit from our proposed label smoothing methods via our proposed confidence-aware label smoothing methods.

Classification Image Classification

Learning To Recognize Procedural Activities with Distant Supervision

1 code implementation CVPR 2022 Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani

In this paper we consider the problem of classifying fine-grained, multi-step activities (e. g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to several minutes.

Action Classification Language Modelling +1

Label Hallucination for Few-Shot Classification

1 code implementation6 Dec 2021 Yiren Jian, Lorenzo Torresani

At the same time, training a simple linear classifier on top of "frozen" features learned from the large labeled dataset fails to adapt the model to the properties of the novel classes, effectively inducing underfitting.

Classification Few-Shot Learning

Ego4D: Around the World in 3,000 Hours of Egocentric Video

3 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Long-Short Temporal Contrastive Learning of Video Transformers

no code implementations CVPR 2022 Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani

Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.

Action Recognition Contrastive Learning +1

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

no code implementations CVPR 2021 Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang

The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label.

Action Detection Action Recognition

A Multi-View Approach To Audio-Visual Speaker Verification

no code implementations11 Feb 2021 Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf

Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.

Speaker Verification

Is Space-Time Attention All You Need for Video Understanding?

12 code implementations9 Feb 2021 Gedas Bertasius, Heng Wang, Lorenzo Torresani

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

Action Classification Action Recognition +4

A Petri Dish for Histopathology Image Analysis

no code implementations29 Jan 2021 Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists.

Natural Questions Transfer Learning

Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

no code implementations16 Jan 2021 Maxwell Mbabilla Aladago, Lorenzo Torresani

By selecting a weight among a fixed set of random values for each individual connection, our method uncovers combinations of random weights that match the performance of traditionally-trained networks of the same capacity.

Resolution-Based Distillation for Efficient Histology Image Classification

no code implementations11 Jan 2021 Joseph DiPalma, Arief A. Suriawinata, Laura J. Tafe, Lorenzo Torresani, Saeed Hassanpour

Our results show that a combination of KD and self-supervision allows the student model to approach, and in some cases, surpass the classification accuracy of the teacher, while being much more efficient.

Classification General Classification +1

Learn like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification

no code implementations29 Sep 2020 Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Mustafa Nasir-Moin, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Based on the nature of histopathology images, a range of difficulty inherently exists among examples, and, since medical datasets are often labeled by multiple annotators, annotator agreement can be used as a natural proxy for the difficulty of a given example.

General Classification Image Classification

COBE: Contextualized Object Embeddings from Narrated Instructional Video

no code implementations NeurIPS 2020 Gedas Bertasius, Lorenzo Torresani

A fully-supervised approach to recognizing object states and their contexts in the real-world is unfortunately marred by the long-tailed, open-ended distribution of the data, which would effectively require massive amounts of annotations to capture the appearance of objects in all their different forms.

Human-Object Interaction Detection object-detection +2

Video Understanding as Machine Translation

no code implementations12 Jun 2020 Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani

With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations.

Machine Translation Metric Learning +6

Stein Variational Inference for Discrete Distributions

no code implementations1 Mar 2020 Jun Han, Fan Ding, Xianglong Liu, Lorenzo Torresani, Jian Peng, Qiang Liu

In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions.

Variational Inference

STAR-Caps: Capsule Networks with Straight-Through Attentive Routing

no code implementations NeurIPS 2019 Karim Ahmed, Lorenzo Torresani

Capsule networks have been shown to be powerful models for image classification, thanks to their ability to represent and capture viewpoint variations of an object.

Classification General Classification +1

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

1 code implementation NeurIPS 2020 Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, Du Tran

To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

Audio Classification Deep Clustering +4

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

no code implementations19 Jul 2019 Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.

Motion Estimation Video Understanding

UniDual: A Unified Model for Image and Video Understanding

no code implementations10 Jun 2019 Yufei Wang, Du Tran, Lorenzo Torresani

It consists of a shared 2D spatial convolution followed by two parallel point-wise convolutional layers, one devoted to images and the other one used for videos.

Multi-Task Learning Video Understanding

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

3 code implementations NeurIPS 2019 Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation.

Ranked #2 on Multi-Person Pose Estimation on PoseTrack2018 (using extra training data)

Multi-Person Pose Estimation Optical Flow Estimation

Attentive Action and Context Factorization

no code implementations10 Apr 2019 Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai

This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual elements.

Action Recognition

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

no code implementations ICCV 2019 Bruno Korbar, Du Tran, Lorenzo Torresani

We demonstrate that the computational cost of action recognition on untrimmed videos can be dramatically reduced by invoking recognition only on these most salient clips.

Action Recognition

Video Classification with Channel-Separated Convolutional Networks

5 code implementations ICCV 2019 Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Action Classification Action Recognition +2

DistInit: Learning Video Representations Without a Single Labeled Video

no code implementations ICCV 2019 Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

In this work, we propose an alternative approach to learning video representations that require no semantically labeled videos and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

Ranked #69 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Video Recognition

Learning Discriminative Motion Features Through Detection

no code implementations11 Dec 2018 Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

Our network learns to spatially sample features from Frame B in order to maximize pose detection accuracy in Frame A.

Fine-grained Action Recognition Pose Estimation

Object Detection in Video with Spatiotemporal Sampling Networks

no code implementations ECCV 2018 Gedas Bertasius, Lorenzo Torresani, Jianbo Shi

We propose a Spatiotemporal Sampling Network (STSN) that uses deformable convolutions across time for object detection in videos.

object-detection Optical Flow Estimation +1

Connectivity Learning in Multi-Branch Networks

no code implementations ICLR 2018 Karim Ahmed, Lorenzo Torresani

While much of the work in the design of convolutional networks over the last five years has revolved around the empirical investigation of the importance of depth, filter sizes, and number of feature channels, recent studies have shown that branching, i. e., splitting the computation along parallel but distinct threads and then aggregating their outputs, represents a new promising dimension for significant improvements in performance.

Image Classification

BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections

no code implementations20 Apr 2017 Karim Ahmed, Lorenzo Torresani

We introduce an architecture for large-scale image categorization that enables the end-to-end learning of separate visual features for the different classes to distinguish.

Image Categorization Object Recognition

Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images

no code implementations5 Mar 2017 Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Katherine M. Nicka, Matthew A. Suriawinata, Lorenzo Torresani, Arief A. Suriawinata, Saeed Hassanpour

In this work, we built an automatic image-understanding method that can accurately classify different types of colorectal polyps in whole-slide histology images to help pathologists with histopathological characterization and diagnosis of colorectal polyps.

General Classification whole slide images

VideoMCC: a New Benchmark for Video Comprehension

no code implementations23 Jun 2016 Du Tran, Maksim Bolonkin, Manohar Paluri, Lorenzo Torresani

Language has been exploited to sidestep the problem of defining video categories, by formulating video understanding as the task of captioning or description.

Multiple-choice Video Description +1

Multiple Hypothesis Colorization

no code implementations20 Jun 2016 Mohammad Haris Baig, Lorenzo Torresani

In the experiments we show that our proposed method outperforms traditional JPEG color coding by a large margin, producing colors that are nearly indistinguishable from the ground truth at the storage cost of just a few hundred bytes for high-resolution pictures!

Colorization Image Compression

Convolutional Random Walk Networks for Semantic Image Segmentation

no code implementations CVPR 2017 Gedas Bertasius, Lorenzo Torresani, Stella X. Yu, Jianbo Shi

It combines these two objectives via a novel random walk layer that enforces consistent spatial grouping in the deep layers of the network.

Image Segmentation Scene Labeling +1

Local Perturb-and-MAP for Structured Prediction

no code implementations24 May 2016 Gedas Bertasius, Qiang Liu, Lorenzo Torresani, Jianbo Shi

In this work, we present a new Local Perturb-and-MAP (locPMAP) framework that replaces the global optimization with a local optimization by exploiting our observed connection between locPMAP and the pseudolikelihood of the original CRF model.

Combinatorial Optimization Structured Prediction

Network of Experts for Large-Scale Image Categorization

no code implementations20 Apr 2016 Karim Ahmed, Mohammad Haris Baig, Lorenzo Torresani

The training of our "network of experts" is completely end-to-end: the partition of categories into disjoint subsets is learned simultaneously with the parameters of the network trunk and the experts are trained jointly by minimizing a single learning objective over all classes.

Image Categorization Image Classification

Recurrent Mixture Density Network for Spatiotemporal Visual Attention

no code implementations27 Mar 2016 Loris Bazzani, Hugo Larochelle, Lorenzo Torresani

In this work, we propose a spatiotemporal attentional model that learns where to look in a video directly from human fixation data.

Action Classification Saliency Prediction

Deep End2End Voxel2Voxel Prediction

no code implementations20 Nov 2015 Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis.

Neural Architecture Search Optical Flow Estimation +3

Semantic Segmentation with Boundary Neural Fields

no code implementations CVPR 2016 Gedas Bertasius, Jianbo Shi, Lorenzo Torresani

To overcome these problems, we introduce a Boundary Neural Field (BNF), which is a global energy model integrating FCN predictions with boundary cues.

Boundary Detection Object Localization +1

Coupled Depth Learning

no code implementations19 Jan 2015 Mohammad Haris Baig, Lorenzo Torresani

Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme.

Depth Estimation regression

Learning Spatiotemporal Features with 3D Convolutional Networks

22 code implementations ICCV 2015 Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

Action Recognition Action Recognition In Videos

Self-taught Object Localization with Deep Networks

no code implementations13 Sep 2014 Loris Bazzani, Alessandro Bergamo, Dragomir Anguelov, Lorenzo Torresani

This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional human supervision, i. e., without using any ground-truth bounding boxes for training.

Object Localization

EXMOVES: Classifier-based Features for Scalable Action Recognition

no code implementations20 Dec 2013 Du Tran, Lorenzo Torresani

We show the generality of our approach by building our mid-level descriptors from two different low-level feature representations.

Action Recognition General Classification

Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification

no code implementations CVPR 2013 Alessandro Bergamo, Sudipta N. Sinha, Lorenzo Torresani

In this paper we propose a new technique for learning a discriminative codebook for local feature descriptors, specifically designed for scalable landmark classification.

General Classification

PiCoDes: Learning a Compact Code for Novel-Category Recognition

no code implementations NeurIPS 2011 Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon

In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance.

Object Recognition

Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

no code implementations NeurIPS 2010 Alessandro Bergamo, Lorenzo Torresani

In this paper we investigate and compare methods that learn image classifiers by combining very few manually annotated examples (e. g., 1-10 images per class) and a large number of weakly-labeled Web photos retrieved using keyword-based image search.

Domain Adaptation General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.