Search Results for author: David Doermann

Found 58 papers, 15 papers with code

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

1 code implementation15 Jul 2024 Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, JianFeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

First, to enable dual-modal generation and maximize the information exchange between video and depth generation, we propose a unified dual-modal U-Net, a parameter-sharing framework for joint video and depth denoising, wherein a modality label guides the denoising target, and cross-modal attention enables the mutual information flow.

Denoising Monocular Depth Estimation +2

ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

1 code implementation17 Jun 2024 Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in training MLLMs to communicate between language and vision.

Decoder Visual Reasoning

Artemis: Towards Referential Understanding in Complex Videos

1 code implementation1 Jun 2024 Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring.

Text Summarization Video Grounding

ChartReformer: Natural Language-Driven Chart Image Editing

1 code implementation1 Mar 2024 Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermann

Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios.

Federated Learning via Input-Output Collaborative Distillation

1 code implementation22 Dec 2023 Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann

Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data.

Federated Learning Image Classification

The Analysis and Extraction of Structure from Organizational Charts

no code implementations16 Nov 2023 Nikhil Manali, David Doermann, Mahesh Desai

Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions.

Player Re-Identification Using Body Part Appearences

no code implementations23 Oct 2023 Mahesh Bhosale, Abhishek Kumar, David Doermann

Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bilinear-pooling layer that generates and spatially pools the body part map.

Pose Estimation Triplet

SOAR: Scene-debiasing Open-set Action Recognition

1 code implementation ICCV 2023 Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua

The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.

Decoder Open Set Action Recognition +1

Language-guided Human Motion Synthesis with Atomic Actions

1 code implementation18 Aug 2023 Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan

In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition.

Diversity Motion Synthesis

SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding

no code implementations3 Aug 2023 Saleem Ahmed, Pengyu Yan, David Doermann, Srirangaraj Setlur, Venu Govindaraju

A combination of sparse and dense per-pixel objectives coupled with a uni-modal self-attention-based feature-fusion layer is applied to learn KP embeddings.

Chart Understanding Keypoint Estimation +1

Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

1 code implementation1 Jul 2023 Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann

In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-trained model in a differential and non-alternative optimization.

Decoder Image Classification +1

Context-Aware Chart Element Detection

1 code implementation7 May 2023 Pengyu Yan, Saleem Ahmed, David Doermann

As a prerequisite of chart data extraction, the accurate detection of chart basic elements is essential and mandatory.

Data Visualization Document AI +2

LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation

1 code implementation3 May 2023 Jay Lal, Aditya Mitkari, Mahesh Bhosale, David Doermann

Existing works, however, are not robust to all these variations, either taking an all-chart unified approach or relying on auxiliary information such as legends for line data extraction.

Data Visualization document understanding +2

Progressive Multi-view Human Mesh Recovery with Self-Supervision

no code implementations10 Dec 2022 Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu

To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e. g., motion capture, sport analysis) and robustness to single-view ambiguities.

Benchmarking Diversity +1

Self-supervised Human Mesh Recovery with Cross-Representation Alignment

no code implementations10 Sep 2022 Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, David Doermann, Ziyan Wu

However, on synthetic dense correspondence maps (i. e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation.

Diversity Human Mesh Recovery

Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation

no code implementations10 Sep 2022 Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje

Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized.

Federated Learning Image Classification +4

Associative Adversarial Learning Based on Selective Attack

no code implementations28 Dec 2021 Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo

We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.

Adversarial Robustness Few-Shot Learning +2

Semantic Text-to-Face GAN -ST^2FG

no code implementations22 Jul 2021 Manan Oza, Sukalpa Chanda, David Doermann

Our approach is capable of generating images that are very accurately aligned to the exhaustive textual descriptions of faces with many fine detail features of the face and helps in generating better images.

Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track

no code implementations21 Jun 2021 Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan

The base model training encourages the model to predict reliable predictions based on single modality (i. e., RGB or optical flow), based on the fusion of which a pseudo ground truth is generated and in turn used as supervision to train the base models.

Optical Flow Estimation Weakly-supervised Learning +2

Cogradient Descent for Dependable Learning

no code implementations20 Jun 2021 Runqi Wang, Baochang Zhang, Li'an Zhuo, Qixiang Ye, David Doermann

Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative.

Image Inpainting Image Reconstruction +1

Layer-Wise Searching for 1-Bit Detectors

no code implementations CVPR 2021 Sheng Xu, Junhe Zhao, Jinhu Lu, Baochang Zhang, Shumin Han, David Doermann

At each layer, it exploits a differentiable binarization search (DBS) to minimize the angular error in a student-teacher framework.

Binarization

Oriented Object Detection with Transformer

no code implementations6 Jun 2021 Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.

Object object-detection +2

Multi-UAV Mobile Edge Computing and Path Planning Platform based on Reinforcement Learning

no code implementations3 Feb 2021 Huan Chang, Yicheng Chen, Baochang Zhang, David Doermann

Unmanned Aerial vehicles (UAVs) are widely used as network processors in mobile networks, but more recently, UAVs have been used in Mobile Edge Computing as mobile servers.

Edge-computing reinforcement-learning +2

IDARTS: Interactive Differentiable Architecture Search

no code implementations ICCV 2021 Song Xue, Runqi Wang, Baochang Zhang, Tian Wang, Guodong Guo, David Doermann

Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end.

Ensemble Attention Distillation for Privacy-Preserving Federated Learning

no code implementations ICCV 2021 Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje

Such decentralized training naturally leads to issues of imbalanced or differing data distributions among the local models and challenges in fusing them into a central model.

Federated Learning Privacy Preserving

A Review of Recent Advances of Binary Neural Networks for Edge Computing

no code implementations24 Nov 2020 Wenyu Zhao, Teli Ma, Xuan Gong, Baochang Zhang, David Doermann

Edge computing is promising to become one of the next hottest topics in artificial intelligence because it benefits various evolving domains such as real-time unmanned aerial systems, industrial applications, and the demand for privacy protection.

Edge-computing Neural Architecture Search +3

Binarized Neural Architecture Search for Efficient Object Recognition

no code implementations8 Sep 2020 Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo

In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.

Edge-computing Face Recognition +3

Anti-Bandit Neural Architecture Search for Model Defense

no code implementations ECCV 2020 Hanlin Chen, Baochang Zhang, Song Xue, Xuan Gong, Hong Liu, Rongrong Ji, David Doermann

Deep convolutional neural networks (DCNNs) have dominated as the best performers in machine learning, but can be challenged by adversarial attacks.

Denoising Neural Architecture Search

iffDetector: Inference-aware Feature Filtering for Object Detection

1 code implementation23 Jun 2020 Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann

In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.

Object object-detection +1

Cogradient Descent for Bilinear Optimization

no code implementations CVPR 2020 Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji

Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.

Image Reconstruction Network Pruning

CP-NAS: Child-Parent Neural Architecture Search for Binary Neural Networks

no code implementations30 Apr 2020 Li'an Zhuo, Baochang Zhang, Hanlin Chen, Linlin Yang, Chen Chen, Yanjun Zhu, David Doermann

To this end, a Child-Parent (CP) model is introduced to a differentiable NAS to search the binarized architecture (Child) under the supervision of a full-precision model (Parent).

Neural Architecture Search

NAS-Count: Counting-by-Density with Neural Architecture Search

no code implementations ECCV 2020 Yutao Hu, Xiao-Long Jiang, Xuhui Liu, Baochang Zhang, Jungong Han, Xian-Bin Cao, David Doermann

Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address the scale variation problem, but at the expense of demanding design efforts.

Crowd Counting Decoder +2

Binarized Neural Architecture Search

no code implementations25 Nov 2019 Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji

A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models.

Neural Architecture Search

Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation

no code implementations CVPR 2019 Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann

The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs).

Diversity

Towards Optimal Structured CNN Pruning via Generative Adversarial Learning

1 code implementation CVPR 2019 Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann

In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner.

Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network

no code implementations3 Mar 2019 Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xian-Tong Zhen, Xian-Bin Cao, David Doermann, Ling Shao

In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.

Crowd Counting Decoder +1

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

1 code implementation CVPR 2019 Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji

The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.

Clustering Model Compression

Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation

no code implementations30 Nov 2018 Jiaxin Gu, Ce Li, Baochang Zhang, Jungong Han, Xian-Bin Cao, Jianzhuang Liu, David Doermann

The advancement of deep convolutional neural networks (DCNNs) has driven significant improvement in the accuracy of recognition systems for many computer vision tasks.

IOD-CNN: Integrating Object Detection Networks for Event Recognition

no code implementations21 Mar 2017 Sungmin Eum, Hyungtae Lee, Heesung Kwon, David Doermann

Many previous methods have showed the importance of considering semantically relevant objects for performing event recognition, yet none of the methods have exploited the power of deep convolutional neural networks to directly integrate relevant object information into a unified network.

Object object-detection +1

SHOE: Supervised Hashing with Output Embeddings

no code implementations30 Jan 2015 Sravanthi Bondugula, Varun Manjunatha, Larry S. Davis, David Doermann

We present a supervised binary encoding scheme for image retrieval that learns projections by taking into account similarity between classes obtained from output embeddings.

Attribute Image Retrieval +2

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

no code implementations29 Oct 2014 Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, Peng Ye

Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards.

Language Modelling

Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language

no code implementations28 Oct 2014 David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, Michael Bloodgood

We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data.

Active Sampling for Subjective Image Quality Assessment

no code implementations CVPR 2014 Peng Ye, David Doermann

Subjective tests based on the Mean Opinion Score (MOS) have been widely used in previous studies, but have many known problems such as an ambiguous scale definition and dissimilar interpretations of the scale among subjects.

Image Quality Assessment

Orientation Robust Text Line Detection in Natural Images

no code implementations CVPR 2014 Le Kang, Yi Li, David Doermann

In this paper, higher-order correlation clustering (HOCC) is used for text line detection in natural images.

Clustering graph partitioning +1

Convolutional Neural Networks for No-Reference Image Quality Assessment

no code implementations CVPR 2014 Le Kang, Peng Ye, Yi Li, David Doermann

In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image.

No-Reference Image Quality Assessment

Beyond Human Opinion Scores: Blind Image Quality Assessment based on Synthetic Scores

no code implementations CVPR 2014 Peng Ye, Jayant Kumar, David Doermann

Instead of training on human opinion scores, we propose to train BIQA models on synthetic scores derived from Full-Reference (FR) IQA measures.

Blind Image Quality Assessment

Linguistic Resources for Handwriting Recognition and Translation Evaluation

no code implementations LREC 2012 Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel

LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT.

Document Classification Handwriting Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.