Search Results for author: Teng Wang

Found 29 papers, 21 papers with code

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

1 code implementation4 Apr 2024 Tiantian Geng, Teng Wang, yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

audio-visual event localization Event Detection +2

Video Understanding with Large Language Models: A Survey

1 code implementation29 Dec 2023 Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

no code implementations ICCV 2023 Baoshuo Kan, Teng Wang, Wenpeng Lu, XianTong Zhen, Weili Guan, Feng Zheng

Pre-trained vision-language models, e. g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning.

Few-Shot Image Classification Transfer Learning

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

1 code implementation ICCV 2023 Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.

Caption Generation Hallucination +2

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas

1 code implementation26 Jun 2023 Chen Li, Xutan Peng, Teng Wang, Yixiao Ge, Mengyang Liu, Xuyuan Xu, Yexin Wang, Ying Shan

Art forms such as movies and television (TV) dramas are reflections of the real world, which have attracted much attention from the multimodal learning community recently.

Genre classification Retrieval +1

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

1 code implementation17 Jun 2023 Yunlong Tang, Jinrui Zhang, Xiangchen Wang, Teng Wang, Feng Zheng

This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality.

Boundary Captioning Language Modelling +1

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

1 code implementation4 May 2023 Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

controllable image captioning Instruction Following

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

1 code implementation27 Apr 2023 Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks.

Multi-Task Learning

Accelerating Vision-Language Pretraining with Free Language Modeling

1 code implementation CVPR 2023 Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo

FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.

Language Modelling Masked Language Modeling

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

1 code implementation CVPR 2023 Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng

To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.

audio-visual event localization

LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation

1 code implementation2 Mar 2023 Xiaoguang Chang, Teng Wang, Shaowei Cai, Changyin Sun

Besides, representation-level unbiased strategies endow LANDMARK the advantage of compatibility with other methods.

Graph Generation Object +3

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

1 code implementation25 Sep 2022 Yunlong Tang, Siting Xu, Teng Wang, Qin Lin, Qinglin Lu, Feng Zheng

The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage.

Video Editing Video Segmentation +1

Exploiting Context Information for Generic Event Boundary Captioning

1 code implementation3 Jul 2022 Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo

Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information.

Boundary Captioning

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

1 code implementation17 Jun 2022 Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo

Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.

Contrastive Learning Data Augmentation +2

Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization

no code implementations21 Apr 2022 Teng Wang, Shujuan Fan, Daikun Liu, Changyin Sun

Furthermore, we design a dual-branch Transformer head network to combine image features from multi-scale windows in order to improve details of the global feature representation.

Representation Learning

Semantic-Aware Pretraining for Dense Video Captioning

no code implementations13 Apr 2022 Teng Wang, Zhu Liu, Feng Zheng, Zhichao Lu, Ran Cheng, Ping Luo

This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021.

Dense Captioning Dense Video Captioning

Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation

1 code implementation17 Mar 2022 Xiaoguang Chang, Teng Wang, Changyin Sun, Wenzhe Cai

Scene graph generation is a sophisticated task because there is no specific recognition pattern (e. g., "looking at" and "near" have no conspicuous difference concerning vision, whereas "near" could occur between entities with different morphology).

 Ranked #1 on Predicate Classification on Visual Genome (mean Recall @20 metric)

Graph Generation Predicate Classification +2

Auto-ABSA: Automatic Detection of Aspects in Aspect-Based Sentiment Analysis

1 code implementation5 Jan 2022 Teng Wang, Bolun Sun, Yijie Tong

In this paper, we proposed a method that uses an auxiliary sentence about aspects that the sentence contains to help sentiment prediction.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

A coarse-to-fine approach for dynamic-to-static image translation

1 code implementation Pattern Recognition 2021 Teng Wang, Lin Wu, Changyin Sun

Using the coarse predicted image, we explicitly infer a more accurate dynamic mask to identify both dynamic objects and their shadows, so that the task could be effectively converted to an image inpainting problem.

Image Inpainting Image-to-Image Translation +2

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

1 code implementation23 Jun 2021 Libo Wang, Rui Li, Dongzhi Wang, Chenxi Duan, Teng Wang, Xiaoliang Meng

Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation.

Autonomous Driving Decision Making +3

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

no code implementations17 May 2021 Lin Wu, Teng Wang, Changyin Sun

In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments.

Segmentation Semantic Segmentation +1

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

1 code implementation21 Jun 2020 Teng Wang, Huicheng Zheng, Mingjing Yu

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020.

Dense Captioning Dense Video Captioning

Local Differential Privacy based Federated Learning for Internet of Things

no code implementations19 Apr 2020 Yang Zhao, Jun Zhao, Mengmeng Yang, Teng Wang, Ning Wang, Lingjuan Lyu, Dusit Niyato, Kwok-Yan Lam

To avoid the privacy threat and reduce the communication cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.

BIG-bench Machine Learning Federated Learning +1

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

no code implementations27 Nov 2019 Jun Zhao, Teng Wang, Tao Bai, Kwok-Yan Lam, Zhiying Xu, Shuyu Shi, Xuebin Ren, Xinyu Yang, Yang Liu, Han Yu

Although both classical Gaussian mechanisms [1, 2] assume $0 < \epsilon \leq 1$, our review finds that many studies in the literature have used the classical Gaussian mechanisms under values of $\epsilon$ and $\delta$ where the added noise amounts of [1, 2] do not achieve $(\epsilon,\delta)$-DP.

Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

no code implementations4 Jun 2019 Teng Wang, Jun Zhao, Han Yu, Jinyan Liu, Xinyu Yang, Xuebin Ren, Shuyu Shi

To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision.

Autonomous Vehicles Decision Making +1

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

no code implementations25 Dec 2018 Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e. g., image, video and voice processing.

Cannot find the paper you are looking for? You can Submit a new open access paper.