Search Results for author: Wenhao Wu

Found 51 papers, 23 papers with code

Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples

no code implementations • Findings (ACL) 2022 • Yu Xia, Quan Wang, Yajuan Lyu, Yong Zhu, Wenhao Wu, Sujian Li, Dai Dai

However, the existing method depends on the relevance between tasks and is prone to inter-type confusion. In this paper, we propose a novel two-stage framework Learn-and-Review (L&R) for continual NER under the type-incremental setting to alleviate the above issues. Specifically, for the learning stage, we distill the old knowledge from teacher to a student on the current dataset.

Continual Named Entity Recognition named-entity-recognition +2

Paper
Add Code

Retrieval Head Mechanistically Explains Long-Context Factuality

no code implementations • 24 Apr 2024 • Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context.

Continual Pretraining Hallucination +2

Paper
Add Code

LongEmbed: Extending Embedding Models for Long Context Retrieval

1 code implementation • 18 Apr 2024 • Dawei Zhu, Liang Wang, Nan Yang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li

This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training.

4k 8k +3

Paper
Code

CoUDA: Coherence Evaluation via Unified Data Augmentation

1 code implementation • 31 Mar 2024 • Dawei Zhu, Wenhao Wu, YiFan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models.

Coherence Evaluation Data Augmentation

Paper
Code

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

no code implementations • 19 Mar 2024 • Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

Paper
Add Code

MetaSplit: Meta-Split Network for Limited-Stock Product Recommendation

no code implementations • 11 Mar 2024 • Wenhao Wu, Jialiang Zhou, Ailong He, Shuguang Han, Jufeng Chen, Bo Zheng

Due to limited user interactions for each product (i. e. item), the corresponding item embedding in the CTR model may not easily converge.

Click-Through Rate Prediction Meta-Learning +1

Paper
Add Code

GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

no code implementations • 18 Jan 2024 • Guangzhao Dai, Xiangbo Shu, Wenhao Wu

Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks.

Action Recognition Text Matching

Paper
Add Code

Deep Structure and Attention Aware Subspace Clustering

1 code implementation • 25 Dec 2023 • Wenhao Wu, Weiwei Wang, Shengjiang Kong

However, previous deep clustering methods, especially image clustering, focus on the features of the data itself and ignore the relationship between the data, which is crucial for clustering.

Clustering Deep Clustering +2

Paper
Code

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

2 code implementations • 27 Nov 2023 • Huanjin Yao, Wenhao Wu, Zhiheng Li

In this paper, we present a novel Spatial-Temporal Side Network for memory-efficient fine-tuning large image models to video understanding, named Side4Video.

Ranked #3 on Action Recognition on Something-Something V1

Action Classification Action Recognition +3

Paper
Code

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations • 27 Nov 2023 • Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

835

Paper
Code

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

1 code implementation • 19 Sep 2023 • Dawei Zhu, Nan Yang, Liang Wang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li

To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window.

2k Position

143

Paper
Code

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations • ICCV 2023 • Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Ranked #4 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

Paper
Code

RestGPT: Connecting Large Language Models with Real-World RESTful APIs

no code implementations • 11 Jun 2023 • YiFan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, Sujian Li

To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection.

Paper
Add Code

UATVR: Uncertainty-Adaptive Text-Video Retrieval

1 code implementation • ICCV 2023 • Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang

In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation.

Retrieval Semantic correspondence +1

Paper
Code

Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus

no code implementations • CVPR 2023 • Wenhao Wu, Hau San Wong, Si Wu

Stereo-based 3D object detection, which aims at detecting 3D objects with stereo cameras, shows great potential in low-cost deployment compared to LiDAR-based methods and excellent performance compared to monocular-based algorithms.

3D Object Detection Depth Estimation +3

Paper
Add Code

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations • CVPR 2023 • Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Ranked #7 on Video Retrieval on VATEX

Data Augmentation Retrieval +2

202

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

202

Paper
Code

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

1 code implementation • 20 Dec 2022 • Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Sujian Li, Yajuan Lv

As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks.

Natural Language Inference Question Answering +2

Paper
Code

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

no code implementations • 3 Dec 2022 • Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong liu

Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair.

4k Style Transfer

Paper
Add Code

FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness

no code implementations • 1 Nov 2022 • Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Ziqiang Cao, Sujian Li, Hua Wu

We first measure a model's factual robustness by its success rate to defend against adversarial attacks when generating factual information.

Abstractive Text Summarization

Paper
Add Code

Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation

no code implementations • 22 Oct 2022 • Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Sujian Li, Yajuan Lyu

Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied.

Informativeness Text Generation

Paper
Add Code

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

no code implementations • 11 Oct 2022 • Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang

In order to guide the encoder to fully excavate spatial-temporal features, two separate decoders are used for two pretext tasks of disentangled appearance and motion prediction.

motion prediction

Paper
Add Code

Effective Invertible Arbitrary Image Rescaling

no code implementations • 26 Sep 2022 • Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding

To increase its real world applicability, numerous models have also been proposed to restore SR images with arbitrary scale factors, including asymmetric ones where images are resized to different scales along horizontal and vertical directions.

Image Super-Resolution

Paper
Add Code

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

no code implementations • 21 Aug 2022 • Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang

We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.

Clustering Contrastive Learning +4

Paper
Add Code

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations • 21 Jul 2022 • Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Ranked #4 on Action Recognition on ActivityNet

Action Recognition Video Classification +1

Paper
Add Code

Temporal Saliency Query Network for Efficient Video Recognition

no code implementations • 21 Jul 2022 • Boyang xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han

For each category, the common pattern of it is employed as a query and the most salient frames are responded to it.

Ranked #5 on Action Recognition on ActivityNet

Action Recognition Video Recognition

Paper
Add Code

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

5 code implementations • 4 Jul 2022 • Wenhao Wu, Zhun Sun, Wanli Ouyang

In this study, we focus on transferring knowledge for video classification tasks.

Ranked #1 on Action Recognition on ActivityNet

Action Classification Action Recognition +5

202

Paper
Code

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

1 code implementation • CVPR 2022 • Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich

The first one lets T compete with G to achieve maximum perturbation.

Contrastive Learning Image-to-Image Translation +1

Paper
Code

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

no code implementations • 10 Mar 2022 • Wei Li, Wenhao Wu, Moye Chen, Jiachen Liu, Xinyan Xiao, Hua Wu

In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods.

Abstractive Text Summarization Data-to-Text Generation +2

Paper
Add Code

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

no code implementations • CVPR 2022 • Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel.

Image Super-Resolution

Paper
Add Code

Temporal Action Proposal Generation with Background Constraint

1 code implementation • 15 Dec 2021 • Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

Paper
Code

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations • 9 Aug 2021 • Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Paper
Add Code

Discovering Distinctive "Semantics" in Super-Resolution Networks

no code implementations • 1 Aug 2021 • Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong

We show that a well-trained deep SR network is naturally a good descriptor of degradation information.

Dimensionality Reduction Image Super-Resolution

Paper
Add Code

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye

In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.

Crowd Counting Transfer Learning

Paper
Add Code

Color2Embed: Fast Exemplar-Based Image Colorization using Color Embeddings

3 code implementations • 15 Jun 2021 • Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He

In this paper, we present a fast exemplar-based image colorization approach using color embeddings named Color2Embed.

Colorization Image Colorization +1

Paper
Code

ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency

no code implementations • ICCV 2021 • Deng Huang, Wenhao Wu, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding

Specifically, we propose two tasks to learn the appearance and speed consistency, respectively.

Action Recognition Representation Learning +2

Paper
Add Code

BASS: Boosting Abstractive Summarization with Unified Semantic Graph

no code implementations • ACL 2021 • Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Ziqiang Cao, Sujian Li, Hua Wu, Haifeng Wang

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text.

Abstractive Text Summarization Document Summarization +2

Paper
Add Code

Temporal Action Proposal Generation with Transformers

no code implementations • 25 May 2021 • Lining Wang, Haosen Yang, Wenhao Wu, Hongxun Yao, Hujie Huang

Conventionally, the temporal action proposal generation (TAPG) task is divided into two main sub-tasks: boundary prediction and proposal confidence prediction, which rely on the frame-level dependencies and proposal-level relationships separately.

Temporal Action Proposal Generation

Paper
Add Code

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

1 code implementation • 25 May 2021 • Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, YingYing Li, Mingde Yao, ZiChao Dong, Yifeng Shi

Long-range and short-range temporal modeling are two complementary and crucial aspects of video recognition.

Ranked #6 on Action Recognition on ActivityNet

Action Recognition Long-range modeling +2

Paper
Code

Good Practices and A Strong Baseline for Traffic Anomaly Detection

1 code implementation • 9 May 2021 • Yuxiang Zhao, Wenhao Wu, Yue He, YingYing Li, Xiao Tan, Shifeng Chen

In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing.

Anomaly Detection Management +1

Paper
Code

A Comprehensive Attempt to Research Statement Generation

no code implementations • 25 Apr 2021 • Wenhao Wu, Sujian Li

For a researcher, writing a good research statement is crucial but costs a lot of time and effort.

Clustering

Paper
Add Code

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations • 13 Dec 2020 • Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Ranked #33 on Action Recognition on Something-Something V1

Action Classification Action Recognition +2

140

Paper
Code

Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition

1 code implementation • ECCV 2020 • Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, Yu Qiao

To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image.

Ranked #22 on Multi-Label Classification on MS-COCO

Multi-Label Classification

122

Paper
Code

Composing Elementary Discourse Units in Abstractive Summarization

no code implementations • ACL 2020 • Zhenwen Li, Wenhao Wu, Sujian Li

In this paper, we argue that elementary discourse unit (EDU) is a more appropriate textual unit of content selection than the sentence unit in abstractive summarization.

Abstractive Text Summarization reinforcement-learning +2

Paper
Add Code

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

no code implementations • 3 May 2020 • Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, Jiji C. V

This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results.

Image Super-Resolution

Paper
Add Code

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code implementations • 9 Feb 2020 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

413

Paper
Code

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

no code implementations • ICCV 2019 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen

Video Recognition has drawn great research interest and great progress has been made.

Ranked #7 on Action Recognition on ActivityNet

Action Recognition General Classification +5

Paper
Add Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.

Ranked #3 on Scene Text Detection on ICDAR 2013

Scene Text Detection Semantic Segmentation +2

261

Paper
Code

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

3 code implementations • ECCV 2018 • Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.

Ranked #2 on Curved Text Detection on SCUT-CTW1500

Curved Text Detection Text Detection

4,075

Paper
Code

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

1 code implementation • CVPR 2018 • Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai

We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.

Ranked #2 on Scene Text Detection on ICDAR 2017 MLT

Multi-Oriented Scene Text Detection object-detection +2

315

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.