no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
1 code implementation • 1 Sep 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
1 code implementation • 29 Aug 2023 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
We quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.
no code implementations • 2 Aug 2023 • Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu
Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.
no code implementations • 16 Jul 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
We present a new formulation for structured information extraction (SIE) from visually rich documents.
1 code implementation • 6 Jul 2023 • Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su
Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution.
no code implementations • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto
We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks (all for one) and fully shared among multiple tasks (one for all), resulting in a single model which we named Musketeer.
1 code implementation • 13 Apr 2023 • Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
no code implementations • CVPR 2023 • Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person.
no code implementations • CVPR 2023 • Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto
With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.
1 code implementation • 19 Oct 2022 • Yifan Xu, Nicklas Hansen, ZiRui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so.
no code implementations • 5 Oct 2022 • Zheng Ding, James Hou, Zhuowen Tu
In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition.
1 code implementation • 30 Sep 2022 • Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe
In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.
no code implementations • 18 Aug 2022 • Zheng Ding, Jieke Wang, Zhuowen Tu
In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.
1 code implementation • 11 Aug 2022 • Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.
1 code implementation • 22 May 2022 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
no code implementations • 12 May 2022 • Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
We present a method to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.
no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.
1 code implementation • CVPR 2022 • Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu
In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.
Ranked #6 on
Text Spotting
on ICDAR 2015
no code implementations • CVPR 2022 • Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
no code implementations • CVPR 2022 • Justin Lazarow, Weijian Xu, Zhuowen Tu
In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance.
no code implementations • 4 Nov 2021 • Sainan Liu, Vincent Nguyen, Yuan Gao, Subarna Tripathi, Zhuowen Tu
Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.
3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #54 on
Image Generation
on CIFAR-10
1 code implementation • NeurIPS 2021 • Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.
Ranked #2 on
Online Action Detection
on TVSeries
1 code implementation • ACL 2021 • Tyler A. Chang, Yifan Xu, Weijian Xu, Zhuowen Tu
In this paper, we detail the relationship between convolutions and self-attention in natural language tasks.
no code implementations • CVPR 2021 • Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.
no code implementations • ICCV 2021 • Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto
Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.
Human-Object Interaction Detection
Relationship Detection
+1
2 code implementations • CVPR 2021 • Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu
Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.
9 code implementations • ICCV 2021 • Weijian Xu, Yifan Xu, Tyler Chang, Zhuowen Tu
In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms.
1 code implementation • CVPR 2021 • Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.
Self-Supervised Learning
Semi-Supervised Image Classification
2 code implementations • CVPR 2021 • Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
In this paper, we present a joint end-to-end line segment detection algorithm using Transformers that is post-processing and heuristics-guided intermediate processing (edge/junction/region detection) free.
Ranked #1 on
Line Segment Detection
on York Urban Dataset
(FH metric)
1 code implementation • ICLR 2021 • Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu
The success of deep convolutional neural networks builds on top of the learning of effective convolution operations, capturing a hierarchy of structured features via filtering, activation, and pooling.
Ranked #15 on
Few-Shot Image Classification
on FC100 5-way (5-shot)
no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu
Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.
Ranked #1 on
Image Generation
on LSUN Bedroom 128 x 128
no code implementations • ECCV 2020 • Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu
One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.
no code implementations • CVPR 2020 • Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, Zhuowen Tu
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
no code implementations • 13 Oct 2019 • Yifan Xu, Lu Dai, Udaikaran Singh, Kening Zhang, Zhuowen Tu
Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs.
no code implementations • 13 Oct 2019 • Yifan Xu, Kening Zhang, Haoyu Dong, Yuezhou Sun, Wenlong Zhao, Zhuowen Tu
Exposure bias describes the phenomenon that a language model trained under the teacher forcing schema may perform poorly at the inference stage when its predictions are conditioned on its previous predictions unseen from the training corpus.
no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu
We tackle the problem of modeling sequential visual phenomena.
1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu
Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.
Ranked #21 on
Panoptic Segmentation
on COCO test-dev
no code implementations • ICLR 2019 • Jeng-Hau Lin, Yunfan Yang, Rajesh K. Gupta, Zhuowen Tu
Memory and computation efficient deep learning architectures are crucial to the continued proliferation of machine learning capabilities to new platforms and systems.
no code implementations • CVPR 2018 • Saining Xie, Sainan Liu, Zeyu Chen, Zhuowen Tu
We tackle the problem of point cloud recognition.
no code implementations • 19 Mar 2018 • Jeng-Hau Lin, Yunfan Yang, Rajesh Gupta, Zhuowen Tu
In this paper, we tackle the problem us- ing a strategy different from the existing literature by proposing local binary pattern networks or LBPNet, that is able to learn and perform binary operations in an end-to-end fashion.
1 code implementation • ECCV 2018 • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.
Ranked #24 on
Action Recognition
on UCF101
(using extra training data)
no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.
no code implementations • 26 Nov 2017 • Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi
Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.
1 code implementation • CVPR 2018 • Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.
no code implementations • ICCV 2017 • Justin Lazarow, Long Jin, Zhuowen Tu
We study unsupervised learning by developing a generative model built from progressively learned deep convolutional neural networks.
no code implementations • 15 Jul 2017 • Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, Rajesh K. Gupta
State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.
no code implementations • NeurIPS 2017 • Long Jin, Justin Lazarow, Zhuowen Tu
We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities.
no code implementations • 25 Apr 2017 • Justin Lazarow, Long Jin, Zhuowen Tu
We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks.
no code implementations • 28 Nov 2016 • Long Jin, Zeyu Chen, Zhuowen Tu
Instance segmentation has attracted recent attention in computer vision and existing methods in this domain mostly have an object detection stage.
4 code implementations • 23 Nov 2016 • Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng
A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.
52 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on
Image Classification
on GasHisSDB
2 code implementations • CVPR 2017 • Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr
Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).
Ranked #4 on
RGB Salient Object Detection
on SBU
no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu
Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.
1 code implementation • 26 May 2016 • Jameson Merkow, David Kriegman, Alison Marsden, Zhuowen Tu
In this work, we present a novel 3D-Convolutional Neural Network (CNN) architecture called I2I-3D that predicts boundary location in volumetric data.
no code implementations • 23 Nov 2015 • Saining Xie, Xun Huang, Zhuowen Tu
Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.
no code implementations • 23 Nov 2015 • Patrick W. Gallagher, Shuai Tang, Zhuowen Tu
Top-down information plays a central role in human perception, but plays relatively little role in many current state-of-the-art deep networks, such as Convolutional Neural Networks (CNNs).
2 code implementations • 30 Sep 2015 • Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu
We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.
Ranked #17 on
Image Classification
on MNIST
1 code implementation • 11 May 2015 • Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik
One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.
16 code implementations • ICCV 2015 • Saining Xie, Zhuowen Tu
We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.
1 code implementation • 18 Sep 2014 • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.
Ranked #23 on
Image Classification
on MNIST
no code implementations • CVPR 2014 • Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu
Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.
no code implementations • 27 May 2014 • Zhuowen Tu, Piotr Dollar, Ying-Nian Wu
Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'.
no code implementations • 30 Jul 2013 • Jingdong Wang, Jing Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li
The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data.
no code implementations • CVPR 2013 • Quannan Li, Jiajun Wu, Zhuowen Tu
Obtaining effective mid-level representations has become an increasingly important task in computer vision.
no code implementations • CVPR 2013 • Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille
In the second step, we estimate the transformation using a robust estimator called L 2 E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step.
no code implementations • CVPR 2013 • Bo Wang, Zhuowen Tu
With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure.