Search Results for author: Wenhao Wu

Found 47 papers, 21 papers with code

Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples

no code implementations Findings (ACL) 2022 Yu Xia, Quan Wang, Yajuan Lyu, Yong Zhu, Wenhao Wu, Sujian Li, Dai Dai

However, the existing method depends on the relevance between tasks and is prone to inter-type confusion. In this paper, we propose a novel two-stage framework Learn-and-Review (L&R) for continual NER under the type-incremental setting to alleviate the above issues. Specifically, for the learning stage, we distill the old knowledge from teacher to a student on the current dataset.

Continual Named Entity Recognition named-entity-recognition +2

MetaSplit: Meta-Split Network for Limited-Stock Product Recommendation

no code implementations11 Mar 2024 Wenhao Wu, Jialiang Zhou, Ailong He, Shuguang Han, Jufeng Chen, Bo Zheng

Due to limited user interactions for each product (i. e. item), the corresponding item embedding in the CTR model may not easily converge.

Click-Through Rate Prediction Meta-Learning +1

GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

no code implementations18 Jan 2024 Guangzhao Dai, Xiangbo Shu, Wenhao Wu

Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks.

Action Recognition Text Matching

Deep Structure and Attention Aware Subspace Clustering

1 code implementation25 Dec 2023 Wenhao Wu, Weiwei Wang, Shengjiang Kong

However, previous deep clustering methods, especially image clustering, focus on the features of the data itself and ignore the relationship between the data, which is crucial for clustering.

Clustering Deep Clustering +2

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations27 Nov 2023 Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

2 code implementations27 Nov 2023 Huanjin Yao, Wenhao Wu, Zhiheng Li

In this paper, we present a novel Spatial-Temporal Side Network for memory-efficient fine-tuning large image models to video understanding, named Side4Video.

Action Classification Action Recognition +3

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

1 code implementation19 Sep 2023 Dawei Zhu, Nan Yang, Liang Wang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li

To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window.

Position

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations ICCV 2023 Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Action Classification Action Recognition +1

RestGPT: Connecting Large Language Models with Real-World RESTful APIs

no code implementations11 Jun 2023 YiFan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, Sujian Li

To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection.

UATVR: Uncertainty-Adaptive Text-Video Retrieval

1 code implementation ICCV 2023 Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang

In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation.

Retrieval Semantic correspondence +1

Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus

no code implementations CVPR 2023 Wenhao Wu, Hau San Wong, Si Wu

Stereo-based 3D object detection, which aims at detecting 3D objects with stereo cameras, shows great potential in low-cost deployment compared to LiDAR-based methods and excellent performance compared to monocular-based algorithms.

3D Object Detection Depth Estimation +3

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations CVPR 2023 Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Data Augmentation Retrieval +2

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

1 code implementation20 Dec 2022 Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Sujian Li, Yajuan Lv

As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks.

Natural Language Inference Question Answering +2

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

no code implementations3 Dec 2022 Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong liu

Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair.

Style Transfer

FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness

no code implementations1 Nov 2022 Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Ziqiang Cao, Sujian Li, Hua Wu

We first measure a model's factual robustness by its success rate to defend against adversarial attacks when generating factual information.

Abstractive Text Summarization

Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation

no code implementations22 Oct 2022 Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Sujian Li, Yajuan Lyu

Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied.

Informativeness Text Generation

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

no code implementations11 Oct 2022 Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang

In order to guide the encoder to fully excavate spatial-temporal features, two separate decoders are used for two pretext tasks of disentangled appearance and motion prediction.

motion prediction

Effective Invertible Arbitrary Image Rescaling

no code implementations26 Sep 2022 Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding

To increase its real world applicability, numerous models have also been proposed to restore SR images with arbitrary scale factors, including asymmetric ones where images are resized to different scales along horizontal and vertical directions.

Image Super-Resolution

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

no code implementations21 Aug 2022 Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang

We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.

Clustering Contrastive Learning +4

Temporal Saliency Query Network for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han

For each category, the common pattern of it is employed as a query and the most salient frames are responded to it.

Action Recognition Video Recognition

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Action Recognition Video Classification +1

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

no code implementations10 Mar 2022 Wei Li, Wenhao Wu, Moye Chen, Jiachen Liu, Xinyan Xiao, Hua Wu

In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods.

Abstractive Text Summarization Data-to-Text Generation +2

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

no code implementations CVPR 2022 Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel.

Image Super-Resolution

Temporal Action Proposal Generation with Background Constraint

1 code implementation15 Dec 2021 Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

no code implementations27 Jul 2021 Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye

In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.

Crowd Counting Transfer Learning

Color2Embed: Fast Exemplar-Based Image Colorization using Color Embeddings

3 code implementations15 Jun 2021 Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He

In this paper, we present a fast exemplar-based image colorization approach using color embeddings named Color2Embed.

Colorization Image Colorization +1

Temporal Action Proposal Generation with Transformers

no code implementations25 May 2021 Lining Wang, Haosen Yang, Wenhao Wu, Hongxun Yao, Hujie Huang

Conventionally, the temporal action proposal generation (TAPG) task is divided into two main sub-tasks: boundary prediction and proposal confidence prediction, which rely on the frame-level dependencies and proposal-level relationships separately.

Temporal Action Proposal Generation

BASS: Boosting Abstractive Summarization with Unified Semantic Graph

no code implementations ACL 2021 Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Ziqiang Cao, Sujian Li, Hua Wu, Haifeng Wang

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text.

Abstractive Text Summarization Document Summarization +2

Good Practices and A Strong Baseline for Traffic Anomaly Detection

1 code implementation9 May 2021 Yuxiang Zhao, Wenhao Wu, Yue He, YingYing Li, Xiao Tan, Shifeng Chen

In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing.

Anomaly Detection Management +1

A Comprehensive Attempt to Research Statement Generation

no code implementations25 Apr 2021 Wenhao Wu, Sujian Li

For a researcher, writing a good research statement is crucial but costs a lot of time and effort.

Clustering

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations13 Dec 2020 Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Action Classification Action Recognition +2

Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition

1 code implementation ECCV 2020 Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, Yu Qiao

To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image.

Multi-Label Classification

Composing Elementary Discourse Units in Abstractive Summarization

no code implementations ACL 2020 Zhenwen Li, Wenhao Wu, Sujian Li

In this paper, we argue that elementary discourse unit (EDU) is a more appropriate textual unit of content selection than the sentence unit in abstractive summarization.

Abstractive Text Summarization reinforcement-learning +2

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code implementations9 Feb 2020 Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Action Recognition In Videos Temporal Action Localization

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

3 code implementations ECCV 2018 Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.

Curved Text Detection Text Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.