Search Results for author: Wenhao Wu

Found 63 papers, 32 papers with code

Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples

no code implementations Findings (ACL) 2022 Yu Xia, Quan Wang, Yajuan Lyu, Yong Zhu, Wenhao Wu, Sujian Li, Dai Dai

However, the existing method depends on the relevance between tasks and is prone to inter-type confusion. In this paper, we propose a novel two-stage framework Learn-and-Review (L&R) for continual NER under the type-incremental setting to alleviate the above issues. Specifically, for the learning stage, we distill the old knowledge from teacher to a student on the current dataset.

Continual Named Entity Recognition named-entity-recognition +2

Kimi k1.5: Scaling Reinforcement Learning with LLMs

no code implementations22 Jan 2025 Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haozhen Yu, Hongcheng Gao, Huabin Zheng, Huan Yuan, Jia Chen, Jianhang Guo, Jianlin Su, Jianzhou Wang, Jie Zhao, Jin Zhang, Jingyuan Liu, Junjie Yan, Junyan Wu, Lidong Shi, Ling Ye, Longhui Yu, Mengnan Dong, Neo Zhang, Ningchen Ma, Qiwei Pan, Qucheng Gong, Shaowei Liu, Shengling Ma, Shupeng Wei, Sihan Cao, Siying Huang, Tao Jiang, Weihao Gao, Weimin Xiong, Weiran He, Weixiao Huang, Wenhao Wu, Wenyang He, Xianghui Wei, Xianqing Jia, Xingzhe Wu, Xinran Xu, Xinxing Zu, Xinyu Zhou, Xuehai Pan, Y. Charles, Yang Li, Yangyang Hu, Yangyang Liu, Yanru Chen, Yejie Wang, Yibo Liu, Yidao Qin, Yifeng Liu, Ying Yang, Yiping Bao, Yulun Du, Yuxin Wu, Yuzhi Wang, Zaida Zhou, Zhaoji Wang, Zhaowei Li, Zhen Zhu, Zheng Zhang, Zhexu Wang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Ziyao Xu, Zonghan Yang

Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e. g., 60. 8 on AIME, 94. 6 on MATH500, 47. 3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3. 5 by a large margin (up to +550%).

Math reinforcement-learning +2

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

2 code implementations24 Dec 2024 Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, DaCheng Tao

Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question.

More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression

no code implementations17 Dec 2024 Jiebin Zhang, Dawei Zhu, YiFan Song, Wenhao Wu, Chuqiao Kuang, Xiaoguang Li, Lifeng Shang, Qun Liu, Sujian Li

As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference.

Quantization

DistinctAD: Distinctive Audio Description Generation in Contexts

no code implementations27 Nov 2024 Bo Fang, Wenhao Wu, Qiangqiang Wu, Yuxin Song, Antoni B. Chan

Audio Descriptions (ADs) aim to provide a narration of a movie in text form, describing non-dialogue-related narratives, such as characters, actions, or scene establishment.

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

1 code implementation15 Oct 2024 Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin Chen

We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation.

Disentanglement Inductive Bias +3

Dense Connector for MLLMs

1 code implementation22 May 2024 Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang

We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.

Video Understanding

FreeVA: Offline MLLM as Training-Free Video Assistant

1 code implementation13 May 2024 Wenhao Wu

The study provides an essential, yet must-know baseline, and reveals several surprising findings: 1) FreeVA, leveraging only offline image-based MLLM without additional training, excels in zero-shot video question-answering (e. g., MSVD-QA, ActivityNet-QA, and MSRVTT-QA), even surpassing state-of-the-art methods that involve video instruction tuning.

Fairness Question Answering +1

Long Context Alignment with Short Instructions and Synthesized Positions

no code implementations7 May 2024 Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li

Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources.

16k Instruction Following

Retrieval Head Mechanistically Explains Long-Context Factuality

1 code implementation24 Apr 2024 Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context.

Continual Pretraining Hallucination +3

LongEmbed: Extending Embedding Models for Long Context Retrieval

1 code implementation18 Apr 2024 Dawei Zhu, Liang Wang, Nan Yang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li

This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training.

4k 8k +4

CoUDA: Coherence Evaluation via Unified Data Augmentation

1 code implementation31 Mar 2024 Dawei Zhu, Wenhao Wu, YiFan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models.

Coherence Evaluation Data Augmentation

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

1 code implementation19 Mar 2024 Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Philip Torr, Jian Wu

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

MetaSplit: Meta-Split Network for Limited-Stock Product Recommendation

no code implementations11 Mar 2024 Wenhao Wu, Jialiang Zhou, Ailong He, Shuguang Han, Jufeng Chen, Bo Zheng

Due to limited user interactions for each product (i. e. item), the corresponding item embedding in the CTR model may not easily converge.

Click-Through Rate Prediction Meta-Learning +1

GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

no code implementations18 Jan 2024 Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks.

Action Recognition Text Matching

Relational Matching for Weakly Semi-Supervised Oriented Object Detection

no code implementations CVPR 2024 Wenhao Wu, Hau-San Wong, Si Wu, Tianyou Zhang

Motivated by weakly supervised learning we introduce annotation-efficient point annotations for unannotated images and propose a weakly semi-supervised method for oriented object detection to balance the detection performance and annotation cost.

Graph Matching Object +4

Deep Structure and Attention Aware Subspace Clustering

1 code implementation25 Dec 2023 Wenhao Wu, Weiwei Wang, Shengjiang Kong

However, previous deep clustering methods, especially image clustering, focus on the features of the data itself and ignore the relationship between the data, which is crucial for clustering.

Clustering Deep Clustering +2

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

2 code implementations27 Nov 2023 Huanjin Yao, Wenhao Wu, Zhiheng Li

In this paper, we present a novel Spatial-Temporal Side Network for memory-efficient fine-tuning large image models to video understanding, named Side4Video.

Action Classification Action Recognition +3

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations27 Nov 2023 Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

2 code implementations19 Sep 2023 Dawei Zhu, Nan Yang, Liang Wang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li

To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window.

2k Position

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations ICCV 2023 Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Action Classification Action Recognition +1

RestGPT: Connecting Large Language Models with Real-World RESTful APIs

no code implementations11 Jun 2023 YiFan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, Sujian Li

To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection.

UATVR: Uncertainty-Adaptive Text-Video Retrieval

1 code implementation ICCV 2023 Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang

In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation.

Retrieval Semantic correspondence +1

Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus

no code implementations CVPR 2023 Wenhao Wu, Hau San Wong, Si Wu

Stereo-based 3D object detection, which aims at detecting 3D objects with stereo cameras, shows great potential in low-cost deployment compared to LiDAR-based methods and excellent performance compared to monocular-based algorithms.

3D Object Detection Depth Estimation +3

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations CVPR 2023 Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Data Augmentation Retrieval +2

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

1 code implementation20 Dec 2022 Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Sujian Li, Yajuan Lv

As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks.

Natural Language Inference Question Answering +2

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

no code implementations3 Dec 2022 Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong liu

Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair.

4k Style Transfer

FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness

no code implementations1 Nov 2022 Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Ziqiang Cao, Sujian Li, Hua Wu

We first measure a model's factual robustness by its success rate to defend against adversarial attacks when generating factual information.

Abstractive Text Summarization

Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation

no code implementations22 Oct 2022 Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Sujian Li, Yajuan Lyu

Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied.

Informativeness Text Generation

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

no code implementations11 Oct 2022 Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang

In order to guide the encoder to fully excavate spatial-temporal features, two separate decoders are used for two pretext tasks of disentangled appearance and motion prediction.

Decoder motion prediction

Effective Invertible Arbitrary Image Rescaling

no code implementations26 Sep 2022 Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding

To increase its real world applicability, numerous models have also been proposed to restore SR images with arbitrary scale factors, including asymmetric ones where images are resized to different scales along horizontal and vertical directions.

Image Rescaling Image Super-Resolution

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

no code implementations21 Aug 2022 Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang

We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.

Clustering Contrastive Learning +5

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Action Recognition Video Classification +1

Temporal Saliency Query Network for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han

For each category, the common pattern of it is employed as a query and the most salient frames are responded to it.

Action Recognition Video Recognition

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

no code implementations10 Mar 2022 Wei Li, Wenhao Wu, Moye Chen, Jiachen Liu, Xinyan Xiao, Hua Wu

In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods.

Abstractive Text Summarization Data-to-Text Generation +2

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

no code implementations CVPR 2022 Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel.

Image Rescaling Image Super-Resolution

Temporal Action Proposal Generation with Background Constraint

1 code implementation15 Dec 2021 Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

no code implementations27 Jul 2021 Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye

In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.

Crowd Counting Transfer Learning

Color2Embed: Fast Exemplar-Based Image Colorization using Color Embeddings

3 code implementations15 Jun 2021 Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He

In this paper, we present a fast exemplar-based image colorization approach using color embeddings named Color2Embed.

Colorization Image Colorization +1

Temporal Action Proposal Generation with Transformers

no code implementations25 May 2021 Lining Wang, Haosen Yang, Wenhao Wu, Hongxun Yao, Hujie Huang

Conventionally, the temporal action proposal generation (TAPG) task is divided into two main sub-tasks: boundary prediction and proposal confidence prediction, which rely on the frame-level dependencies and proposal-level relationships separately.

Temporal Action Proposal Generation

BASS: Boosting Abstractive Summarization with Unified Semantic Graph

no code implementations ACL 2021 Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Ziqiang Cao, Sujian Li, Hua Wu, Haifeng Wang

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text.

Abstractive Text Summarization Decoder +3

Good Practices and A Strong Baseline for Traffic Anomaly Detection

1 code implementation9 May 2021 Yuxiang Zhao, Wenhao Wu, Yue He, YingYing Li, Xiao Tan, Shifeng Chen

In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing.

Anomaly Detection Management +1

A Comprehensive Attempt to Research Statement Generation

no code implementations25 Apr 2021 Wenhao Wu, Sujian Li

For a researcher, writing a good research statement is crucial but costs a lot of time and effort.

Clustering

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations13 Dec 2020 Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Action Classification Action Recognition +2

Composing Elementary Discourse Units in Abstractive Summarization

no code implementations ACL 2020 Zhenwen Li, Wenhao Wu, Sujian Li

In this paper, we argue that elementary discourse unit (EDU) is a more appropriate textual unit of content selection than the sentence unit in abstractive summarization.

Abstractive Text Summarization reinforcement-learning +2

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code implementations9 Feb 2020 Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Action Recognition In Videos Temporal Action Localization

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

3 code implementations ECCV 2018 Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.

Curved Text Detection Text Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.