Search Results for author: Dong Zhang

Found 53 papers, 23 papers with code

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

1 code implementation EMNLP 2021 Xincheng Ju, Dong Zhang, Rong Xiao, Junhui Li, Shoushan Li, Min Zhang, Guodong Zhou

Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA).

Sentiment Analysis Sentiment Classification

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

4 code implementations31 Aug 2023 Xin Zhang, Dong Zhang, ShiMin Li, Yaqian Zhou, Xipeng Qiu

Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models.

Language Modelling Quantization

Training-Free Instance Segmentation from Semantic Image Segmentation Masks

1 code implementation2 Aug 2023 Yuchen Shen, Dong Zhang, yuhui Zheng, Zechao Li, Liyong Fu, Qiaolin Ye

TFISeg does not require training a semantic or/and instance segmentation model and avoids the need for instance-level image annotations.

Image Segmentation Instance Segmentation +2

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

no code implementations25 Jun 2023 Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen

A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).

Benchmarking Contrastive Learning +1

DUB: Discrete Unit Back-translation for Speech Translation

1 code implementation19 May 2023 Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST.

Machine Translation Speech-to-Text Translation +1

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

1 code implementation18 May 2023 Dong Zhang, ShiMin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

Multi-modal large language models are regarded as a crucial step towards Artificial General Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT.

Language Modelling Large Language Model +2

Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era

no code implementations4 May 2023 Dong Zhang

Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program.

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

no code implementations26 Apr 2023 Zenan Shi, Haipeng Chen, Long Chen, Dong Zhang

In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns.

Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation

no code implementations23 Mar 2023 Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

Recently, the advent of vision Transformer (ViT) has brought substantial advancements in 3D dataset benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg).

Image Segmentation Semantic Segmentation +1

Semantic Scene Completion with Cleaner Self

1 code implementation CVPR 2023 Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun

SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).

Vessel-Promoted OCT to OCTA Image Translation by Heuristic Contextual Constraints

1 code implementation13 Mar 2023 Shuhan LI, Dong Zhang, Xiaomeng Li, Chubin Ou, Lin An, Yanwu Xu, Kwang-Ting Cheng

In this paper, we propose a novel framework, TransPro, that translates 3D Optical Coherence Tomography (OCT) images into exclusive 3D OCTA images using an image translation pattern.


Protocol selection for second-order consensus against disturbance

no code implementations10 Dec 2022 Jiamin Wang, Liqi Zhou, Dong Zhang, Jian Liu, Yuanshi Zheng

Noticing that both the absolute and relative velocity protocols can solve the second-order consensus of multi-agent systems, this paper aims to investigate which of the above two protocols has better anti-disturbance capability, in which the anti-disturbance capability is measured by the L2 gain from the disturbance to the consensus error.

Centralized Feature Pyramid for Object Detection

1 code implementation5 Oct 2022 Yu Quan, Dong Zhang, Liyan Zhang, Jinhui Tang

To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation.

object-detection Object Detection

Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions

1 code implementation21 Sep 2022 Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng

Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg).

Data Augmentation Domain Adaptation +3

Graph Reasoning Transformer for Image Parsing

no code implementations20 Sep 2022 Dong Zhang, Jinhui Tang, Kwang-Ting Cheng

In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.

Rethinking the Reference-based Distinctive Image Captioning

1 code implementation22 Jul 2022 Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, Jun Xiao

Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images.

Benchmarking Image Captioning

FedMix: Mixed Supervised Federated Learning for Medical Image Segmentation

1 code implementation4 May 2022 Jeffry Wicaksana, Zengqiang Yan, Dong Zhang, Xijie Huang, Huimin Wu, Xin Yang, Kwang-Ting Cheng

To relax this assumption, in this work, we propose a label-agnostic unified federated learning framework, named FedMix, for medical image segmentation based on mixed image labels.

Federated Learning Image Segmentation +3

Learning to Reduce Information Bottleneck for Object Detection in Aerial Images

1 code implementation5 Apr 2022 Yuchen Shen, Dong Zhang, Zhihao Song, Xuesong Jiang, Qiaolin Ye

In this letter, we first underline the importance of the neck network in object detection from the perspective of information bottleneck.

object-detection Object Detection In Aerial Images

FaceAtlasAR: Atlas of Facial Acupuncture Points in Augmented Reality

1 code implementation29 Nov 2021 Menghe Zhang, Jurgen Schulze, Dong Zhang

Acupuncture is a technique in which practitioners stimulate specific points on the body.

Face Alignment

Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal

1 code implementation11 Nov 2021 Yadong Li, Dongheng Zhang, Jinbo Chen, Jinwei Wan, Dong Zhang, Yang Hu, Qibin Sun, Yan Chen

To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations.

Data Augmentation Gesture Recognition

Cell-Level State of Charge Estimation for Battery Packs Under Minimal Sensing

no code implementations17 Sep 2021 Dong Zhang, Luis D. Couto, Ross Drummond, Shashank Sripad, Venkatasubramanian Viswanathan

This manuscript presents an algorithm for individual Lithium-ion (Li-ion) battery cell state of charge (SOC) estimation in a large-scale battery pack under minimal sensing, where only pack-level voltage and current are measured.

Region-Aware Network: Model Human's Top-Down Visual Perception Mechanism for Crowd Counting

no code implementations23 Jun 2021 Yuehai Chen, Jing Yang, Dong Zhang, Kun Zhang, Badong Chen, Shaoyi Du

More specifically, we scan the whole input images and its priority maps in the form of column vector to obtain a relevance matrix estimating their similarity.

Crowd Counting

Learning Calibrated-Guidance for Object Detection in Aerial Images

1 code implementation21 Mar 2021 Zongqi Wei, Dong Liang, Dong Zhang, Liyan Zhang, Qixiang Geng, Mingqiang Wei, Huiyu Zhou

Specifically, for a given set of feature maps, CG first computes the feature similarity between each channel and the remaining channels as the intermediary calibration guidance.

object-detection Object Detection In Aerial Images +1

Machine Learning based Malicious Payload Identification in Software-Defined Networking

no code implementations4 Jan 2021 Qiumei Cheng, Chunming Wu, Haifeng Zhou, Dezhang Kong, Dong Zhang, Junchi Xing, Wei Ruan

In this paper, a novel OpenFlow-enabled deep packet inspection (OFDPI) approach is proposed based on the SDN paradigm to provide adaptive and efficient packet inspection.

Networking and Internet Architecture

Dual-SLAM: A framework for robust single camera navigation

no code implementations23 Sep 2020 Huajian Huang, Wen-Yan Lin, Siying Liu, Dong Zhang, Sai-Kit Yeung

As local pose estimation is ill-conditioned, local pose estimation failures happen regularly, making the overall SLAM system brittle.

Pose Estimation Simultaneous Localization and Mapping

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

no code implementations12 Aug 2020 Haiwei Wu, Lin Zhang, Lin Yang, Xuyang Wang, Jun-Jie Wang, Dong Zhang, Ming Li

This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020.

Data Augmentation

Feature Pyramid Transformer

1 code implementation ECCV 2020 Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun

Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.

Instance Segmentation object-detection +2

Reconstructing undersampled photoacoustic microscopy images using deep learning

2 code implementations30 May 2020 Anthony DiSpirito III, Daiwei Li, Tri Vu, Maomao Chen, Dong Zhang, Jianwen Luo, Roarke Horstmeyer, Junjie Yao

One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed.

3D Action Recognition

Direct Quantification for Coronary Artery Stenosis Using Multiview Learning

no code implementations20 Jul 2019 Dong Zhang, Guang Yang, Shu Zhao, Yanping Zhang, Heye Zhang, Shuo Li

The proposed DMQCA model consists of a multiview module with two attention mechanisms, a key-frame module, and a regression module, to achieve direct accurate multiple-index estimation.

Multiview Learning regression

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

no code implementations ECCV 2018 Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.

Crowd Counting Management +1

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

1 code implementation ICCV 2017 Amir Mazaheri, Dong Zhang, Mubarak Shah

Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information

no code implementations CVPR 2018 Rodney LaLonde, Dong Zhang, Mubarak Shah

To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).

object-detection Object Detection

Unsupervised Action Proposal Ranking through Proposal Recombination

no code implementations3 Apr 2017 Waqas Sultani, Dong Zhang, Mubarak Shah

Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.

Action Detection Action Recognition +1

Two-View Label Propagation to Semi-supervised Reader Emotion Classification

no code implementations COLING 2016 Shoushan Li, Jian Xu, Dong Zhang, Guodong Zhou

In this paper, we propose a two-view label propagation approach to semi-supervised reader emotion classification by exploiting two views, namely source text and response text in a label propagation algorithm.

Classification Emotion Classification +2

Video Fill in the Blank with Merging LSTMs

no code implementations13 Oct 2016 Amir Mazaheri, Dong Zhang, Mubarak Shah

In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

Local feature hierarchy for face recognition across pose and illumination

no code implementations12 Jul 2016 Xiaoyue Jiang, Dong Zhang, Xiaoyi Feng

Accordingly we propose an end-to-end face recognition method to deal with pose and illumination simultaneously based on convolutional networks where the discriminative nonlinear features that are invariant to pose and illumination are extracted.

Face Recognition

A Framework for Human Pose Estimation in Videos

no code implementations26 Apr 2016 Dong Zhang, Mubarak Shah

A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization.

Pose Estimation

Robust Scene Text Recognition Using Sparse Coding based Features

no code implementations29 Dec 2015 Da-Han Wang, Hanzi Wang, Dong Zhang, Jonathan Li, David Zhang

For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features.

Scene Text Recognition

Human Pose Estimation in Videos

no code implementations ICCV 2015 Dong Zhang, Mubarak Shah

Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.

Pose Estimation

Face Verification Using Boosted Cross-Image Features

no code implementations28 Sep 2013 Dong Zhang, Omar Oreifej, Mubarak Shah

In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.

Face Detection Face Recognition +1

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

no code implementations CVPR 2013 Dong Zhang, Omar Javed, Mubarak Shah

The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.

Optical Flow Estimation Semantic Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.