Search Results for author: Teng Wang

Found 29 papers, 21 papers with code

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

1 code implementation • 4 Apr 2024 • Tiantian Geng, Teng Wang, yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

audio-visual event localization Event Detection +2

Paper
Code

Video Understanding with Large Language Models: A Survey

1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

650

Paper
Code

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

no code implementations • ICCV 2023 • Baoshuo Kan, Teng Wang, Wenpeng Lu, XianTong Zhen, Weili Guan, Feng Zheng

Pre-trained vision-language models, e. g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning.

Few-Shot Image Classification Transfer Learning

Paper
Add Code

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

1 code implementation • ICCV 2023 • Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.

Caption Generation Hallucination +2

129

Paper
Code

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

1 code implementation • ICCV 2023 • Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng

Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks.

Retrieval Text Retrieval

Paper
Code

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas

1 code implementation • 26 Jun 2023 • Chen Li, Xutan Peng, Teng Wang, Yixiao Ge, Mengyang Liu, Xuyuan Xu, Yexin Wang, Ying Shan

Art forms such as movies and television (TV) dramas are reflections of the real world, which have attracted much attention from the multimodal learning community recently.

Genre classification Retrieval +1

Paper
Code

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

1 code implementation • 17 Jun 2023 • Yunlong Tang, Jinrui Zhang, Xiangchen Wang, Teng Wang, Feng Zheng

This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality.

Boundary Captioning Language Modelling +1

Paper
Code

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

1 code implementation • 4 May 2023 • Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

controllable image captioning Instruction Following

1,600

Paper
Code

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

1 code implementation • 27 Apr 2023 • Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks.

Multi-Task Learning

Paper
Code

Accelerating Vision-Language Pretraining with Free Language Modeling

1 code implementation • CVPR 2023 • Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo

FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.

Language Modelling Masked Language Modeling

Paper
Code

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

1 code implementation • CVPR 2023 • Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng

To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.

Ranked #1 on audio-visual event localization on UnAV-100

audio-visual event localization

Paper
Code

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

1 code implementation • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo

Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.

Ranked #1 on Natural Language Moment Retrieval on ActivityNet Captions

Dense Video Captioning Natural Language Moment Retrieval +2

Paper
Code

LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation

1 code implementation • 2 Mar 2023 • Xiaoguang Chang, Teng Wang, Shaowei Cai, Changyin Sun

Besides, representation-level unbiased strategies endow LANDMARK the advantage of compatibility with other methods.

Graph Generation Object +3

Paper
Code

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

1 code implementation • 25 Sep 2022 • Yunlong Tang, Siting Xu, Teng Wang, Qin Lin, Qinglin Lu, Feng Zheng

The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage.

Video Editing Video Segmentation +1

Paper
Code

Exploiting Context Information for Generic Event Boundary Captioning

1 code implementation • 3 Jul 2022 • Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo

Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information.

Boundary Captioning

Paper
Code

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

1 code implementation • 17 Jun 2022 • Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo

Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.

Contrastive Learning Data Augmentation +2

Paper
Code

Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization

no code implementations • 21 Apr 2022 • Teng Wang, Shujuan Fan, Daikun Liu, Changyin Sun

Furthermore, we design a dual-branch Transformer head network to combine image features from multi-scale windows in order to improve details of the global feature representation.

Representation Learning

Paper
Add Code

Semantic-Aware Pretraining for Dense Video Captioning

no code implementations • 13 Apr 2022 • Teng Wang, Zhu Liu, Feng Zheng, Zhichao Lu, Ran Cheng, Ping Luo

This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021.

Dense Captioning Dense Video Captioning

Paper
Add Code

Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation

1 code implementation • 17 Mar 2022 • Xiaoguang Chang, Teng Wang, Changyin Sun, Wenzhe Cai

Scene graph generation is a sophisticated task because there is no specific recognition pattern (e. g., "looking at" and "near" have no conspicuous difference concerning vision, whereas "near" could occur between entities with different morphology).

Ranked #1 on Predicate Classification on Visual Genome (mean Recall @20 metric)

Graph Generation Predicate Classification +2

Paper
Code

Auto-ABSA: Automatic Detection of Aspects in Aspect-Based Sentiment Analysis

1 code implementation • 5 Jan 2022 • Teng Wang, Bolun Sun, Yijie Tong

In this paper, we proposed a method that uses an auxiliary sentence about aspects that the sentence contains to help sentiment prediction.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

A coarse-to-fine approach for dynamic-to-static image translation

1 code implementation • Pattern Recognition 2021 • Teng Wang, Lin Wu, Changyin Sun

Using the coarse predicted image, we explicitly infer a more accurate dynamic mask to identify both dynamic objects and their shadows, so that the task could be effectively converted to an image inpainting problem.

Image Inpainting Image-to-Image Translation +2

Paper
Code

End-to-End Dense Video Captioning with Parallel Decoding

2 code implementations • ICCV 2021 • Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo

Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.

Ranked #5 on Dense Video Captioning on YouCook2

Caption Generation Dense Video Captioning

188

Paper
Code

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

1 code implementation • 23 Jun 2021 • Libo Wang, Rui Li, Dongzhi Wang, Chenxi Duan, Teng Wang, Xiaoliang Meng

Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation.

Ranked #4 on Semantic Segmentation on UAVid

Autonomous Driving Decision Making +3

538

Paper
Code

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

no code implementations • 17 May 2021 • Lin Wu, Teng Wang, Changyin Sun

In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments.

Segmentation Semantic Segmentation +1

Paper
Add Code

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

1 code implementation • 21 Jun 2020 • Teng Wang, Huicheng Zheng, Mingjing Yu

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020.

Ranked #5 on Dense Video Captioning on ActivityNet Captions

Dense Captioning Dense Video Captioning

Paper
Code

Local Differential Privacy based Federated Learning for Internet of Things

no code implementations • 19 Apr 2020 • Yang Zhao, Jun Zhao, Mengmeng Yang, Teng Wang, Ning Wang, Lingjuan Lyu, Dusit Niyato, Kwok-Yan Lam

To avoid the privacy threat and reduce the communication cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.

BIG-bench Machine Learning Federated Learning +1

Paper
Add Code

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

no code implementations • 27 Nov 2019 • Jun Zhao, Teng Wang, Tao Bai, Kwok-Yan Lam, Zhiying Xu, Shuyu Shi, Xuebin Ren, Xinyu Yang, Yang Liu, Han Yu

Although both classical Gaussian mechanisms [1, 2] assume $0 < \epsilon \leq 1$, our review finds that many studies in the literature have used the classical Gaussian mechanisms under values of $\epsilon$ and $\delta$ where the added noise amounts of [1, 2] do not achieve $(\epsilon,\delta)$-DP.

Paper
Add Code

Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

no code implementations • 4 Jun 2019 • Teng Wang, Jun Zhao, Han Yu, Jinyan Liu, Xinyu Yang, Xuebin Ren, Shuyu Shi

To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision.

Autonomous Vehicles Decision Making +1

Paper
Add Code

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

no code implementations • 25 Dec 2018 • Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e. g., image, video and voice processing.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.