1 code implementation • 15 Jul 2024 • Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, JianFeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang
First, to enable dual-modal generation and maximize the information exchange between video and depth generation, we propose a unified dual-modal U-Net, a parameter-sharing framework for joint video and depth denoising, wherein a modality label guides the denoising target, and cross-modal attention enables the mutual information flow.
1 code implementation • 17 Jun 2024 • Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye
Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in training MLLMs to communicate between language and vision.
1 code implementation • 11 Jun 2024 • Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang
Extensive experiments show that our MCM achieves the state-of-the-art video diffusion distillation performance.
1 code implementation • 1 Jun 2024 • Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring.
1 code implementation • 1 Mar 2024 • Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermann
Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios.
1 code implementation • 22 Dec 2023 • Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann
Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data.
no code implementations • 16 Nov 2023 • Nikhil Manali, David Doermann, Mahesh Desai
Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions.
no code implementations • 23 Oct 2023 • Mahesh Bhosale, Abhishek Kumar, David Doermann
Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bilinear-pooling layer that generates and spatially pools the body part map.
1 code implementation • ICCV 2023 • Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
1 code implementation • ICCV 2023 • Yuanhao Zhai, Tianyu Luan, David Doermann, Junsong Yuan
To improve the generalization ability, we propose weakly-supervised self-consistency learning (WSCL) to leverage the weakly annotated images.
1 code implementation • 18 Aug 2023 • Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan
In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition.
no code implementations • 3 Aug 2023 • Saleem Ahmed, Pengyu Yan, David Doermann, Srirangaraj Setlur, Venu Govindaraju
A combination of sparse and dense per-pixel objectives coupled with a uni-modal self-attention-based feature-fusion layer is applied to learn KP embeddings.
1 code implementation • 1 Jul 2023 • Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann
In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-trained model in a differential and non-alternative optimization.
1 code implementation • 7 May 2023 • Pengyu Yan, Saleem Ahmed, David Doermann
As a prerequisite of chart data extraction, the accurate detection of chart basic elements is essential and mandatory.
1 code implementation • 3 May 2023 • Jay Lal, Aditya Mitkari, Mahesh Bhosale, David Doermann
Existing works, however, are not robust to all these variations, either taking an all-chart unified approach or relying on auxiliary information such as legends for line data extraction.
no code implementations • 10 Dec 2022 • Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e. g., motion capture, sport analysis) and robustness to single-view ambiguities.
no code implementations • 16 Oct 2022 • Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation in this work.
no code implementations • 21 Sep 2022 • Liangchen Song, Xuan Gong, Benjamin Planche, Meng Zheng, David Doermann, Junsong Yuan, Terrence Chen, Ziyan Wu
We propose to regularize the estimated motion to be predictable.
no code implementations • 10 Sep 2022 • Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, David Doermann, Ziyan Wu
However, on synthetic dense correspondence maps (i. e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation.
no code implementations • 10 Sep 2022 • Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje
Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized.
no code implementations • 17 Mar 2022 • Runqi Wang, Linlin Yang, Baochang Zhang, Wentao Zhu, David Doermann, Guodong Guo
Research on the generalization ability of deep neural networks (DNNs) has recently attracted a great deal of attention.
no code implementations • 28 Dec 2021 • Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo
We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.
no code implementations • 22 Jul 2021 • Manan Oza, Sukalpa Chanda, David Doermann
Our approach is capable of generating images that are very accurately aligned to the exhaustive textual descriptions of faces with many fine detail features of the face and helps in generating better images.
no code implementations • 21 Jun 2021 • Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan
The base model training encourages the model to predict reliable predictions based on single modality (i. e., RGB or optical flow), based on the fusion of which a pseudo ground truth is generated and in turn used as supervision to train the base models.
no code implementations • 20 Jun 2021 • Runqi Wang, Baochang Zhang, Li'an Zhuo, Qixiang Ye, David Doermann
Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative.
no code implementations • CVPR 2021 • Sheng Xu, Junhe Zhao, Jinhu Lu, Baochang Zhang, Shumin Han, David Doermann
At each layer, it exploits a differentiable binarization search (DBS) to minimize the angular error in a student-teacher framework.
no code implementations • 6 Jun 2021 • Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann
Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.
no code implementations • 7 May 2021 • Mingyuan Mao, Baochang Zhang, David Doermann, Jie Guo, Shumin Han, Yuan Feng, Xiaodi Wang, Errui Ding
This leads to a new problem of confidence discrepancy for the detector ensembles.
no code implementations • 3 Feb 2021 • Huan Chang, Yicheng Chen, Baochang Zhang, David Doermann
Unmanned Aerial vehicles (UAVs) are widely used as network processors in mobile networks, but more recently, UAVs have been used in Mobile Edge Computing as mobile servers.
no code implementations • ICCV 2021 • Song Xue, Runqi Wang, Baochang Zhang, Tian Wang, Guodong Guo, David Doermann
Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end.
no code implementations • ICCV 2021 • Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje
Such decentralized training naturally leads to issues of imbalanced or differing data distributions among the local models and challenges in fusing them into a central model.
no code implementations • 7 Dec 2020 • Xuan Gong, Xin Xia, Wentao Zhu, Baochang Zhang, David Doermann, Lian Zhuo
In recent years, deep learning has dominated progress in the field of medical image analysis.
no code implementations • 24 Nov 2020 • Wenyu Zhao, Teli Ma, Xuan Gong, Baochang Zhang, David Doermann
Edge computing is promising to become one of the next hottest topics in artificial intelligence because it benefits various evolving domains such as real-time unmanned aerial systems, industrial applications, and the demand for privacy protection.
no code implementations • 8 Sep 2020 • Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo
In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
no code implementations • ECCV 2020 • Hanlin Chen, Baochang Zhang, Song Xue, Xuan Gong, Hong Liu, Rongrong Ji, David Doermann
Deep convolutional neural networks (DCNNs) have dominated as the best performers in machine learning, but can be challenged by adversarial attacks.
1 code implementation • 23 Jun 2020 • Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann
In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.
no code implementations • CVPR 2020 • Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji
Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.
no code implementations • 30 Apr 2020 • Li'an Zhuo, Baochang Zhang, Hanlin Chen, Linlin Yang, Chen Chen, Yanjun Zhu, David Doermann
To this end, a Child-Parent (CP) model is introduced to a differentiable NAS to search the binarized architecture (Child) under the supervision of a full-precision model (Parent).
no code implementations • ECCV 2020 • Yutao Hu, Xiao-Long Jiang, Xuhui Liu, Baochang Zhang, Jungong Han, Xian-Bin Cao, David Doermann
Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address the scale variation problem, but at the expense of demanding design efforts.
no code implementations • 25 Nov 2019 • Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji
A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models.
no code implementations • CVPR 2019 • Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann
The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs).
1 code implementation • CVPR 2019 • Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann
In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner.
no code implementations • 3 Mar 2019 • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xian-Tong Zhen, Xian-Bin Cao, David Doermann, Ling Shao
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
1 code implementation • CVPR 2019 • Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji
The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.
no code implementations • 30 Nov 2018 • Jiaxin Gu, Ce Li, Baochang Zhang, Jungong Han, Xian-Bin Cao, Jianzhuang Liu, David Doermann
The advancement of deep convolutional neural networks (DCNNs) has driven significant improvement in the accuracy of recognition systems for many computer vision tasks.
no code implementations • 21 Mar 2017 • Sungmin Eum, Hyungtae Lee, Heesung Kwon, David Doermann
Many previous methods have showed the importance of considering semantically relevant objects for performing event recognition, yet none of the methods have exploited the power of deep convolutional neural networks to directly integrate relevant object information into a unified network.
no code implementations • CVPR 2015 • Xianzhi Du, David Doermann, Wael Abd-Almageed
In this paper, we present a novel partial signature matching method using graphical models.
no code implementations • 30 Jan 2015 • Sravanthi Bondugula, Varun Manjunatha, Larry S. Davis, David Doermann
We present a supervised binary encoding scheme for image retrieval that learns projections by taking into account similarity between classes obtained from output embeddings.
no code implementations • WS 2012 • Michael Bloodgood, Peng Ye, Paul Rodrigues, David Zajic, David Doermann
We investigate combining methods and show that using random forests is a promising approach.
no code implementations • 29 Oct 2014 • Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, Peng Ye
Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards.
no code implementations • 28 Oct 2014 • David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, Michael Bloodgood
We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data.
no code implementations • CVPR 2014 • Peng Ye, David Doermann
Subjective tests based on the Mean Opinion Score (MOS) have been widely used in previous studies, but have many known problems such as an ambiguous scale definition and dissimilar interpretations of the scale among subjects.
no code implementations • CVPR 2014 • Le Kang, Yi Li, David Doermann
In this paper, higher-order correlation clustering (HOCC) is used for text line detection in natural images.
no code implementations • CVPR 2014 • Le Kang, Peng Ye, Yi Li, David Doermann
In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image.
no code implementations • CVPR 2014 • Peng Ye, Jayant Kumar, David Doermann
Instead of training on human opinion scores, we propose to train BIQA models on synthetic scores derived from Full-Reference (FR) IQA measures.
no code implementations • CVPR 2013 • Peng Ye, Jayant Kumar, Le Kang, David Doermann
Second, the proposed method has the potential to be used in multiple image domains.
no code implementations • LREC 2012 • Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel
LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT.