Search Results for author: Zhuowen Tu

Found 69 papers, 28 papers with code

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

no code implementations20 Sep 2023 Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.

Action Classification Action Recognition +2

Object-Centric Multiple Object Tracking

1 code implementation1 Sep 2023 Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking object-detection +2

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

1 code implementation29 Aug 2023 Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

We quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.

Language Modelling

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

no code implementations2 Aug 2023 Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu

Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.

Denoising Image Generation

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

1 code implementation6 Jul 2023 Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution.

Few-Shot Image Classification Knowledge Distillation +7

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

no code implementations11 May 2023 Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks (all for one) and fully shared among multiple tasks (one for all), resulting in a single model which we named Musketeer.

Language Modelling

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

no code implementations CVPR 2023 Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang

On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person.

Guided Recommendation for Model Fine-Tuning

no code implementations CVPR 2023 Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto

With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.

Model Selection Transfer Learning

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

1 code implementation19 Oct 2022 Yifan Xu, Nicklas Hansen, ZiRui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so.

Atari Games 100k Model-based Reinforcement Learning +2

Point Cloud Recognition with Position-to-Structure Attention Transformers

no code implementations5 Oct 2022 Zheng Ding, James Hou, Zhuowen Tu

In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition.

Feature Engineering Scene Segmentation

An In-depth Study of Stochastic Backpropagation

1 code implementation30 Sep 2022 Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe

In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.

Image Classification object-detection +1

Open-Vocabulary Universal Image Segmentation with MaskCLIP

no code implementations18 Aug 2022 Zheng Ding, Jieke Wang, Zhuowen Tu

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Image Segmentation Instance Segmentation +3

Semi-supervised Vision Transformers at Scale

1 code implementation11 Aug 2022 Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.

Inductive Bias Semi-Supervised Image Classification

The Geometry of Multilingual Language Model Representations

1 code implementation22 May 2022 Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.

Cross-Lingual Transfer Transfer Learning +1

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

no code implementations12 May 2022 Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto

We present a method to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations12 Apr 2022 Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

Text Spotting Transformers

1 code implementation CVPR 2022 Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.

Text Detection Text Spotting

MeMOT: Multi-Object Tracking with Memory

no code implementations CVPR 2022 Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.

Multi-Object Tracking object-detection +1

Contrastive Neighborhood Alignment

no code implementations6 Jan 2022 Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers

no code implementations CVPR 2022 Justin Lazarow, Weijian Xu, Zhuowen Tu

In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance.

Instance Segmentation Semantic Segmentation

ViTGAN: Training GANs with Vision Transformers

3 code implementations ICLR 2022 Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Image Generation

Long Short-Term Transformer for Online Action Detection

1 code implementation NeurIPS 2021 Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.

Online Action Detection Playing the Game of 2048

Compatibility-aware Heterogeneous Visual Search

no code implementations CVPR 2021 Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.

Neural Architecture Search Retrieval

Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

no code implementations ICCV 2021 Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.

Human-Object Interaction Detection Relationship Detection +1

Pose Recognition with Cascade Transformers

2 code implementations CVPR 2021 Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu

Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.

Keypoint Detection regression

Co-Scale Conv-Attentional Image Transformers

9 code implementations ICCV 2021 Weijian Xu, Yifan Xu, Tyler Chang, Zhuowen Tu

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms.

Instance Segmentation object-detection +2

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

1 code implementation CVPR 2021 Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.

Self-Supervised Learning Semi-Supervised Image Classification

Line Segment Detection Using Transformers without Edges

2 code implementations CVPR 2021 Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu

In this paper, we present a joint end-to-end line segment detection algorithm using Transformers that is post-processing and heuristics-guided intermediate processing (edge/junction/region detection) free.

Line Segment Detection Multi-Task Learning

Constellation Nets for Few-Shot Learning

1 code implementation ICLR 2021 Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu

The success of deep convolutional neural networks builds on top of the learning of effective convolution operations, capturing a hierarchy of structured features via filtering, activation, and pooling.

Clustering Few-Shot Image Classification +1

Dual Contradistinctive Generative Autoencoder

no code implementations CVPR 2021 Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Image Generation Image Reconstruction +1

One-pixel Signature: Characterizing CNN Models for Backdoor Detection

no code implementations ECCV 2020 Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu

One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.

Guided Variational Autoencoder for Disentanglement Learning

no code implementations CVPR 2020 Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, Zhuowen Tu

We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.

Disentanglement General Classification +1

Neural Program Synthesis By Self-Learning

no code implementations13 Oct 2019 Yifan Xu, Lu Dai, Udaikaran Singh, Kening Zhang, Zhuowen Tu

Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs.

Program Synthesis Reinforcement Learning (RL) +1

Rethinking Exposure Bias In Language Modeling

no code implementations13 Oct 2019 Yifan Xu, Kening Zhang, Haoyu Dong, Yuezhou Sun, Wenlong Zhao, Zhuowen Tu

Exposure bias describes the phenomenon that a language model trained under the teacher forcing schema may perform poorly at the inference stage when its predictions are conditioned on its previous predictions unseen from the training corpus.

Language Modelling Reinforcement Learning (RL)

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation CVPR 2020 Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Instance Segmentation Panoptic Segmentation

Local Binary Pattern Networks for Character Recognition

no code implementations ICLR 2019 Jeng-Hau Lin, Yunfan Yang, Rajesh K. Gupta, Zhuowen Tu

Memory and computation efficient deep learning architectures are crucial to the continued proliferation of machine learning capabilities to new platforms and systems.


Local Binary Pattern Networks

no code implementations19 Mar 2018 Jeng-Hau Lin, Yunfan Yang, Rajesh Gupta, Zhuowen Tu

In this paper, we tackle the problem us- ing a strategy different from the existing literature by proposing local binary pattern networks or LBPNet, that is able to learn and perform binary operations in an end-to-end fashion.


Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

1 code implementation ECCV 2018 Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.

Ranked #24 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Detection +6

Controllable Top-down Feature Transformer

no code implementations6 Dec 2017 Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.

Data Augmentation Style Transfer

DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

no code implementations26 Nov 2017 Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi

Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.

Wasserstein Introspective Neural Networks

1 code implementation CVPR 2018 Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu

We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.

General Classification

Introspective Neural Networks for Generative Modeling

no code implementations ICCV 2017 Justin Lazarow, Long Jin, Zhuowen Tu

We study unsupervised learning by developing a generative model built from progressively learned deep convolutional neural networks.

General Classification

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

no code implementations15 Jul 2017 Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, Rajesh K. Gupta

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.

Introspective Classification with Convolutional Nets

no code implementations NeurIPS 2017 Long Jin, Justin Lazarow, Zhuowen Tu

We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities.

Classification General Classification

Introspective Generative Modeling: Decide Discriminatively

no code implementations25 Apr 2017 Justin Lazarow, Long Jin, Zhuowen Tu

We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks.

General Classification

Object Detection Free Instance Segmentation With Labeling Transformations

no code implementations28 Nov 2016 Long Jin, Zeyu Chen, Zhuowen Tu

Instance segmentation has attracted recent attention in computer vision and existing methods in this domain mostly have an object detection stage.

Instance Segmentation object-detection +2

Deep Convolutional Neural Networks with Merge-and-Run Mappings

4 code implementations23 Nov 2016 Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng

A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.

Deeply supervised salient object detection with short connections

2 code implementations CVPR 2017 Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr

Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).

Boundary Detection object-detection +4

Deep FisherNet for Object Classification

no code implementations31 Jul 2016 Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu

Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.

Classification General Classification +1

Dense Volume-to-Volume Vascular Boundary Detection

1 code implementation26 May 2016 Jameson Merkow, David Kriegman, Alison Marsden, Zhuowen Tu

In this work, we present a novel 3D-Convolutional Neural Network (CNN) architecture called I2I-3D that predicts boundary location in volumetric data.

Boundary Detection

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

no code implementations23 Nov 2015 Saining Xie, Xun Huang, Zhuowen Tu

Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.

What Happened to My Dog in That Network: Unraveling Top-down Generators in Convolutional Neural Networks

no code implementations23 Nov 2015 Patrick W. Gallagher, Shuai Tang, Zhuowen Tu

Top-down information plays a central role in human perception, but plays relatively little role in many current state-of-the-art deep networks, such as Convolutional Neural Networks (CNNs).

Data Augmentation Zero-Shot Learning

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree

2 code implementations30 Sep 2015 Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu

We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.

Image Classification

Training Deeper Convolutional Networks with Deep Supervision

1 code implementation11 May 2015 Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik

One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.

General Classification

Holistically-Nested Edge Detection

16 code implementations ICCV 2015 Saining Xie, Zhuowen Tu

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.

Boundary Detection Edge Detection

Deeply-Supervised Nets

1 code implementation18 Sep 2014 Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.

Classification General Classification +1

MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation

no code implementations CVPR 2014 Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu

Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.

Image Segmentation Interactive Segmentation +2

Layered Logic Classifiers: Exploring the `And' and `Or' Relations

no code implementations27 May 2014 Zhuowen Tu, Piotr Dollar, Ying-Nian Wu

Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'.

Pedestrian Detection Semantic Segmentation

Scalable $k$-NN graph construction

no code implementations30 Jul 2013 Jingdong Wang, Jing Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li

The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data.

graph construction

Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

no code implementations CVPR 2013 Quannan Li, Jiajun Wu, Zhuowen Tu

Obtaining effective mid-level representations has become an increasingly important task in computer vision.

Image Classification

Robust Estimation of Nonrigid Transformation for Point Set Registration

no code implementations CVPR 2013 Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille

In the second step, we estimate the transformation using a robust estimator called L 2 E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step.

Sparse Subspace Denoising for Image Manifolds

no code implementations CVPR 2013 Bo Wang, Zhuowen Tu

With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure.

Denoising Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.