Search Results for author: Taojiannan Yang

Found 25 papers, 17 papers with code

Dense Connector for MLLMs

no code implementations • 22 May 2024 • Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang

We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.

Paper
Add Code

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

no code implementations • 24 Apr 2024 • Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning.

AutoML Image Segmentation +4

Paper
Add Code

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

1 code implementation • 11 Apr 2024 • Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.

SSIM

210

Paper
Code

A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition

1 code implementation • ICCV 2023 • Andong Deng, Taojiannan Yang, Chen Chen

The goal of building a benchmark (suite of datasets) is to provide a unified protocol for fair evaluation and thus facilitate the evolution of a specific area.

Action Recognition Representation Learning +3

Paper
Code

AIM: Adapting Image Models for Efficient Video Action Recognition

1 code implementation • 6 Feb 2023 • Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.

Ranked #2 on Action Recognition on Diving-48 (using extra training data)

Action Classification Action Recognition +2

243

Paper
Code

Language-Assisted Deep Learning for Autistic Behaviors Recognition

no code implementations • 17 Nov 2022 • Andong Deng, Taojiannan Yang, Chen Chen, Qian Chen, Leslie Neely, Sakiko Oyama

In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent.

Action Recognition Multimodal Deep Learning +1

Paper
Add Code

Revisiting Training-free NAS Metrics: An Efficient Training-based Method

1 code implementation • 16 Nov 2022 • Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen

In this paper, we revisit these training-free metrics and find that: (1) the number of parameters (\#Param), which is the most straightforward training-free metric, is overlooked in previous works but is surprisingly effective, (2) recent training-free metrics largely rely on the \#Param information to rank networks.

Neural Architecture Search

Paper
Code

Conquering the Communication Constraints to Enable Large Pre-Trained Models in Federated Learning

no code implementations • 4 Oct 2022 • Guangyu Sun, Umar Khalid, Matias Mendieta, Taojiannan Yang, Chen Chen

Recently, the use of small pre-trained models has been shown effective in federated learning optimization and improving convergence.

Federated Learning

Paper
Add Code

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

1 code implementation • CVPR 2023 • Ce Zheng, Matias Mendieta, Taojiannan Yang, Guo-Jun Qi, Chen Chen

Recently, vision transformers have shown great success in a set of human reconstruction tasks such as 2D human pose estimation (2D HPE), 3D human pose estimation (3D HPE), and human mesh reconstruction (HMR) tasks.

Ranked #29 on 3D Human Pose Estimation on 3DPW

2D Human Pose Estimation 3D Human Pose Estimation

Paper
Code

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

1 code implementation • CVPR 2022 • Matias Mendieta, Taojiannan Yang, Pu Wang, Minwoo Lee, Zhengming Ding, Chen Chen

To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model.

Federated Learning Privacy Preserving

Paper
Code

BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

1 code implementation • 16 May 2021 • Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen, Delu Pan, Jianyu Chen, Liang Xiao, Qian Du

With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings.

Ranked #2 on 2D Semantic Segmentation on xBD

2D Semantic Segmentation Data Augmentation +1

Paper
Code

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation • 14 May 2021 • Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +2

158

Paper
Code

Consistency-based Active Learning for Object Detection

1 code implementation • 18 Mar 2021 • Weiping Yu, Sijie Zhu, Taojiannan Yang, Chen Chen

Unlike most recent works that focused on applying active learning for image classification, we propose an effective Consistency-based Active Learning method for object Detection (CALD), which fully explores the consistency between original and augmented data.

Active Learning Classification +5

Paper
Code

3D Human Pose Estimation with Spatial and Temporal Transformers

3 code implementations • ICCV 2021 • Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding

Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation.

Ranked #13 on Monocular 3D Human Pose Estimation on Human3.6M

Image Classification Monocular 3D Human Pose Estimation +3

471

Paper
Code

Deep Learning-Based Human Pose Estimation: A Survey

1 code implementation • 24 Dec 2020 • Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah

Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included.

2D Human Pose Estimation 3D Human Pose Estimation +1

449

Paper
Code

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

1 code implementation • CVPR 2021 • Sijie Zhu, Taojiannan Yang, Chen Chen

In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge.

Image-Based Localization Image Retrieval

Paper
Code

A3D: Adaptive 3D Networks for Video Action Recognition

no code implementations • 24 Nov 2020 • Sijie Zhu, Taojiannan Yang, Matias Mendieta, Chen Chen

Even under the same computational constraints, the performance of our adaptive networks can be significantly boosted over the baseline counterparts by the mutual training along three dimensions.

Action Recognition Temporal Action Localization

Paper
Add Code

Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection

1 code implementation • 7 Nov 2020 • Weiping Yu, Taojiannan Yang, Chen Chen

To this end, we rethink long-tailed object detection in UAV images and propose the Dual Sampler and Head detection Network (DSHNet), which is the first work that aims to resolve long-tail distribution in UAV images.

Head Detection Image Cropping +2

Paper
Code

Cross-directional Feature Fusion Network for Building Damage Assessment from Satellite Imagery

no code implementations • 27 Oct 2020 • Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen

Fast and effective responses are required when a natural disaster (e. g., earthquake, hurricane, etc.)

Ranked #3 on 2D Semantic Segmentation on xBD

2D Semantic Segmentation Data Augmentation

Paper
Add Code

GradAug: A New Regularization Method for Deep Neural Networks

1 code implementation • NeurIPS 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen

The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process.

Instance Segmentation object-detection +2

Paper
Code

Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation

no code implementations • 23 May 2020 • Sijie Zhu, Taojiannan Yang, Chen Chen

Street-to-aerial image geo-localization, which matches a query street-view image to the GPS-tagged aerial images in a reference set, has attracted increasing attention recently.

Metric Learning

Paper
Add Code

Density Map Guided Object Detection in Aerial Images

1 code implementation • 12 Apr 2020 • Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, Shanyue Guan

Specifically, we propose a Density-Map guided object detection Network (DMNet), which is inspired from the observation that the object density map of an image presents how objects distribute in terms of the pixel intensity of the map.

Image Cropping Object +3

Paper
Code

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

2 code implementations • ECCV 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis

We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.

Instance Segmentation object-detection +3

158

Paper
Code

Visual Explanation for Deep Metric Learning

1 code implementation • 27 Sep 2019 • Sijie Zhu, Taojiannan Yang, Chen Chen

This work explores the visual explanation for deep metric learning and its applications.

Metric Learning Retrieval

Paper
Code

A closer look at network resolution for efficient network design

no code implementations • 25 Sep 2019 • Taojiannan Yang, Sijie Zhu, Yan Shen, Mi Zhang, Andrew Willis, Chen Chen

We propose a framework to mutually learn from different input resolutions and network widths.

Instance Segmentation object-detection +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.