Search Results for author: Haiyang Xu

Found 26 papers, 10 papers with code

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

no code implementations24 May 2023 Zhen-Ru Zhang, Chuanqi Tan, Haiyang Xu, Chengyu Wang, Jun Huang, Songfang Huang

In addition, taking the gate as a probing, we validate the efficiency and effectiveness of the variable prefix.

Language Modelling NER

Transforming Visual Scene Graphs to Image Captions

no code implementations3 May 2023 Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Ming Yan, Fei Huang, Zhangzikang Li, Yu Zhang

In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.

Image Captioning

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

1 code implementation27 Apr 2023 Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

1 code implementation16 Apr 2023 Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

4 code implementations1 Feb 2023 Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

Action Classification Image Classification +7

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations5 Jan 2023 Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

no code implementations30 Dec 2022 Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang

We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.

TGIF-Action TGIF-Frame +7

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

2 code implementations24 May 2022 Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou, Luo Si

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.

Image Captioning Question Answering +5

Image Captioning In the Transformer Age

1 code implementation15 Apr 2022 Yang Xu, Li Li, Haiyang Xu, Songfang Huang, Fei Huang, Jianfei Cai

This drawback inspires the researchers to develop a homogeneous architecture that facilitates end-to-end training, for which Transformer is the perfect one that has proven its huge potential in both vision and language domains and thus can be used as the basic component of the visual encoder and language decoder in an IC pipeline.

Image Captioning Self-Supervised Learning

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

no code implementations21 Aug 2021 Ming Yan, Haiyang Xu, Chenliang Li, Bin Bi, Junfeng Tian, Min Gui, Wei Wang

Existing approaches to vision-language pre-training (VLP) heavily rely on an object detector based on bounding boxes (regions), where salient objects are first detected from images and then a Transformer-based model is used for cross-modal fusion.

object-detection Object Detection

We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

no code implementations25 May 2021 Liyi Guo, Junqi Jin, Haoqi Zhang, Zhenzhe Zheng, Zhiye Yang, Zhizhuang Xing, Fei Pan, Lvyin Niu, Fan Wu, Haiyang Xu, Chuan Yu, Yuning Jiang, Xiaoqiang Zhu

To achieve this goal, the advertising platform needs to identify the advertiser's optimization objectives, and then recommend the corresponding strategies to fulfill the objectives.

Recommendation Systems

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels

no code implementations14 Mar 2021 Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang, Bin Bi, Songfang Huang

Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations.

Learning to Infer User Hidden States for Online Sequential Advertising

no code implementations3 Sep 2020 Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.

A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction

no code implementations20 Aug 2020 Liyi Guo, Rui Lu, Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Jin Li, Haiyang Xu, Han Li, Wenkai Lu, Jian Xu, Kun Gai

For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shopping experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue.


Neural Topic Modeling with Bidirectional Adversarial Training

1 code implementation ACL 2020 Rui Wang, Xuemeng Hu, Deyu Zhou, Yulan He, Yuxuan Xiong, Chenchen Ye, Haiyang Xu

Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).

Text Clustering Topic Models

Learning Syntactic and Dynamic Selective Encoding for Document Summarization

no code implementations25 Mar 2020 Haiyang Xu, Yahao He, Kun Han, Junwen Chen, Xiangang Li

Our approach has the following contributions: first, we incorporate syntactic information such as constituency parsing trees into the encoding sequence to learn both the semantic and syntactic information from the document, resulting in more accurate summary; second, we propose a dynamic gate network to select the salient information based on the context of the decoder state, which is essential to document summarization.

Constituency Parsing Document Summarization

Adversarial Multi-Binary Neural Network for Multi-class Classification

no code implementations25 Mar 2020 Haiyang Xu, Junwen Chen, Kun Han, Xiangang Li

Multi-class text classification is one of the key problems in machine learning and natural language processing.

General Classification Multi-class Classification +3

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization

no code implementations18 Mar 2020 Haiyang Xu, Yun Wang, Kun Han, Baochang Ma, Junwen Chen, Xiangang Li

Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary.

Abstractive Text Summarization Document Summarization

Learning Alignment for Multimodal Emotion Recognition from Speech

1 code implementation6 Sep 2019 Haiyang Xu, HUI ZHANG, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality.

Multimodal Emotion Recognition Speech Emotion Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.