Search Results for author: Ming Yan

Found 107 papers, 47 papers with code

PALM: Pre-training an Autoencoding\&Autoregressive Language Model for Context-conditioned Generation

no code implementations • EMNLP 2020 • Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks covering generative question answering (Rank 1 on the official MARCO leaderboard), abstractive summarization on CNN/DailyMail as well as Gigaword, question generation on SQuAD, and conversational response generation on Cornell Movie Dialogues.

Abstractive Text Summarization Conversational Response Generation +8

Paper
Add Code

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

1 code implementation • 25 Apr 2024 • Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

Charts are important for presenting and explaining complex data relationships.

876

Paper
Code

Adaptive Feature Fusion Neural Network for Glaucoma Segmentation on Unseen Fundus Images

no code implementations • 2 Apr 2024 • Jiyuan Zhong, Hu Ke, Ming Yan

Fundus image segmentation on unseen domains is challenging, especially for the over-parameterized deep models trained on the small medical datasets.

Domain Generalization Image Segmentation +4

Paper
Add Code

Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning

no code implementations • 30 Mar 2024 • Xiaopeng Xie, Ming Yan, Xiwen Zhou, Chenlong Zhao, Suli Wang, Yong Zhang, Joey Tianyi Zhou

In addressing this issue, we are inspired by the notion that a backdoor acts as a shortcut and posit that this shortcut stems from the contrast between the trigger and the data utilized for poisoning.

Data Augmentation Few-Shot Text Classification +1

Paper
Add Code

RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

no code implementations • 28 Mar 2024 • Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes.

Paper
Add Code

Collaborative Knowledge Infusion for Low-resource Stance Detection

no code implementations • 28 Mar 2024 • Ming Yan, Joey Tianyi Zhou, Ivor W. Tsang

Specifically, our stance detection approach leverages target background knowledge collaboratively from different knowledge sources with the help of knowledge alignment.

Stance Detection

Paper
Add Code

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

no code implementations • 21 Mar 2024 • Zonghan Yang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts.

Policy Gradient Methods

Paper
Add Code

RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

1 code implementation • 20 Mar 2024 • Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we introduce RoleInteract, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions.

Paper
Code

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

1 code implementation • 19 Mar 2024 • Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.

document understanding Optical Character Recognition (OCR)

876

Paper
Code

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

no code implementations • 1 Mar 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.

Representation Learning

Paper
Add Code

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

no code implementations • 26 Feb 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.

Retrieval Text Retrieval +1

Paper
Add Code

Budget-Constrained Tool Learning with Planning

1 code implementation • 25 Feb 2024 • Yuanhang Zheng, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked.

Paper
Code

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

no code implementations • 24 Feb 2024 • Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions.

Hallucination Hallucination Evaluation

Paper
Add Code

Model Composition for Multimodal Large Language Models

no code implementations • 20 Feb 2024 • Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.

Paper
Add Code

PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs

no code implementations • 20 Feb 2024 • An Liu, Zonghan Yang, Zhenhe Zhang, Qingyuan Hu, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models.

text-classification Text Classification

Paper
Add Code

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

1 code implementation • 19 Feb 2024 • Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.

Paper
Code

Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement

1 code implementation • 19 Feb 2024 • Zijun Liu, Boqun Kou, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

Although Large Language Models (LLMs) have demonstrated strong performance on a wide range of tasks, they still face reliability challenges such as hallucination.

Hallucination

Paper
Code

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

1 code implementation • 29 Jan 2024 • Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

To assess the performance of Mobile-Agent, we introduced Mobile-Eval, a benchmark for evaluating mobile device operations.

1,825

Paper
Code

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

1 code implementation • 14 Jan 2024 • Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang

Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task.

Language Modelling Large Language Model

137

Paper
Code

Knowledge Distillation for Closed-Source Language Models

no code implementations • 13 Jan 2024 • Hongzhan Chen, Xiaojun Quan, Hehong Chen, Ming Yan, Ji Zhang

The prior estimation aims to derive a prior distribution by utilizing the corpus generated by closed-source language models, while the posterior estimation employs a proxy model to update the prior distribution and derive a posterior distribution.

Knowledge Distillation

Paper
Add Code

Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection

no code implementations • 11 Jan 2024 • Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao Ye, Chenliang Li, Ming Yan, Shikun Zhang, Songhang Huang, Fei Huang

Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models.

Paper
Add Code

LARP: Language-Agent Role Play for Open-World Games

no code implementations • 24 Dec 2023 • Ming Yan, Ruihao Li, Hao Zhang, Hao Wang, Zhilan Yang, Ji Yan

Language agents have shown impressive problem-solving skills within defined settings and brief timelines.

Decision Making

Paper
Add Code

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

1 code implementation • 14 Dec 2023 • Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities.

Contrastive Learning Data Augmentation

Paper
Code

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

1 code implementation • 12 Dec 2023 • Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them.

Ranked #74 on Visual Question Answering on MM-Vet

Contrastive Learning Hallucination +4

Paper
Code

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

1 code implementation • 30 Nov 2023 • Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.

Language Modelling Large Language Model

876

Paper
Code

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

Paper
Code

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2 code implementations • 7 Nov 2023 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.

Ranked #11 on Visual Question Answering (VQA) on InfiMM-Eval

Language Modelling Large Language Model +1

1,938

Paper
Code

MCC-KD: Multi-CoT Consistent Knowledge Distillation

1 code implementation • 23 Oct 2023 • Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting.

Knowledge Distillation Mathematical Reasoning

Paper
Code

Physical Information Neural Networks for Solving High-index Differential-algebraic Equation Systems Based on Radau Methods

no code implementations • 19 Oct 2023 • Jiasheng Chen, Juan Tang, Ming Yan, Shuai Lai, Kun Liang, Jianguang Lu, Wenqiang Yang

In this paper, we propose a PINN computational framework, combined Radau IIA numerical method with a neural network structure via the attention mechanisms, to directly solve high-index DAEs.

Paper
Add Code

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

2 code implementations • 8 Oct 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs.

Language Modelling Large Language Model +1

876

Paper
Code

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

1 code implementation • 2 Sep 2023 • Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior.

1,868

Paper
Code

Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection

no code implementations • 31 Aug 2023 • Kairui Hu, Ming Yan, Joey Tianyi Zhou, Ivor W. Tsang, Wen Haw Chong, Yong Keong Yap

In response to these identified gaps, we introduce the Ladder-of-Thought (LoT) for the stance detection task.

Stance Detection

Paper
Add Code

Evaluation and Analysis of Hallucination in Large Vision-Language Models

1 code implementation • 29 Aug 2023 • Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Hallucination Hallucination Evaluation

Paper
Code

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

1 code implementation • 19 Jul 2023 • Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou, Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang, Jingren Zhou

In this paper, we present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs in terms of both safety and responsibility criteria.

410

Paper
Code

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

no code implementations • 17 Jul 2023 • Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang

Specifically, We incorporate a Text-Semantics-Aware Patch Selector (TSPS) into the ViT backbone to perform a coarse-grained visual token extraction and then attach a flexible Transformer-based Patch Abstraction Decoder (PAD) upon the backbone for top-level visual abstraction.

Text Summarization

Paper
Add Code

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

1 code implementation • 4 Jul 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding.

document understanding Language Modelling +2

876

Paper
Code

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

1 code implementation • 7 Jun 2023 • Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.

Cross-Modal Retrieval Language Modelling +3

256

Paper
Code

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding

1 code implementation • 15 May 2023 • Linhui Xiao, Xiaoshan Yang, Fang Peng, Ming Yan, YaoWei Wang, Changsheng Xu

In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.

Transfer Learning Visual Grounding

Paper
Code

Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering

no code implementations • 14 May 2023 • Qianglong Chen, Guohai Xu, Ming Yan, Ji Zhang, Fei Huang, Luo Si, Yin Zhang

Existing knowledge-enhanced methods have achieved remarkable results in certain QA tasks via obtaining diverse knowledge from different knowledge bases.

Explanation Generation Question Answering

Paper
Add Code

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

no code implementations • 13 May 2023 • Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student.

Knowledge Distillation

Paper
Add Code

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.

Ranked #3 on Visual Question Answering (VQA) on HallusionBench

Visual Question Answering (VQA) Zero-Shot Video Question Answer

1,938

Paper
Code

From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping

1 code implementation • 26 Apr 2023 • Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang

Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which fails the concept to transfer across modalities.

Image Captioning Image Classification +3

Paper
Code

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

1 code implementation • 16 Apr 2023 • Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.

World Knowledge

301

Paper
Code

Improved Visual Fine-tuning with Natural Language Supervision

1 code implementation • ICCV 2023 • Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian

Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples.

Paper
Code

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

no code implementations • CVPR 2023 • Ming Yan, Xin Wang, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

The core of this dataset is a blending optimization process, which corrects for the pose as it drifts and is affected by the magnetic conditions.

Pose Prediction

Paper
Add Code

Correspondence-Free Domain Alignment for Unsupervised Cross-Domain Image Retrieval

1 code implementation • 13 Feb 2023 • Xu Wang, Dezhong Peng, Ming Yan, Peng Hu

Thanks to the ISS and CCA, our method could encode the discrimination into the domain-invariant embedding space for unsupervised cross-domain image retrieval.

Image Retrieval Retrieval

Paper
Code

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

4 code implementations • 1 Feb 2023 • Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

Ranked #1 on Video Captioning on MSR-VTT

Action Classification Image Classification +7

6,055

Paper
Code

Adaptively Clustering Neighbor Elements for Image Captioning

no code implementations • 5 Jan 2023 • Zihua Wang, Xu Yang, Haiyang Xu, Hanwang Zhang, and Qinghao Ye, Chenliang Li, and Weiwei Sun, Ming Yan, Songfang Huang, Fei Huang, Yu Zhang

We design a novel global-local Transformer named \textbf{Ada-ClustFormer} (\textbf{ACF}) to generate captions.

Clustering Image Captioning

Paper
Add Code

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations • ICCV 2023 • Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

Paper
Add Code

BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.

no code implementations • ICCV 2023 • Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang

In this paper, we propose a Bottom-Up Patch Summarization approach named BUS which is inspired by the Document Summarization Task in NLP to learn a concise visual summary of lengthy visual token sequences, guided by textual semantics.

Abstractive Text Summarization Document Summarization

Paper
Add Code

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

no code implementations • ICCV 2023 • Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang

We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.

Ranked #1 on Visual Question Answering (VQA) on TGIF-QA

TGIF-Action TGIF-Frame +7

Paper
Add Code

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction

3 code implementations • 3 Dec 2022 • Samiul Alam, Luyang Liu, Ming Yan, Mi Zhang

Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical.

Federated Learning Model extraction

Paper
Code

Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment

no code implementations • 14 Nov 2022 • Junyang Wang, Yi Zhang, Ming Yan, Ji Zhang, Jitao Sang

We further propose Anchor Augment to guide the generative model's attention to the fine-grained information in the representation of CLIP.

Computational Efficiency Image Captioning +2

Paper
Add Code

Eye-tracking based classification of Mandarin Chinese readers with and without dyslexia using neural sequence models

1 code implementation • 18 Oct 2022 • Patrick Haller, Andreas Säuberli, Sarah Elisabeth Kiener, Jinger Pan, Ming Yan, Lena Jäger

Eye movements are known to reflect cognitive processes in reading, and psychological reading research has shown that eye gaze patterns differ between readers with and without dyslexia.

Sentence Word Embeddings

Paper
Code

Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

1 code implementation • 14 Oct 2022 • Zhuoqing Song, Weijian Li, Kexin Jin, Lei Shi, Ming Yan, Wotao Yin, Kun Yuan

In the proposed family, EquiStatic has a degree of $\Theta(\ln(n))$, where $n$ is the network size, and a series of time-dependent one-peer topologies, EquiDyn, has a constant degree of 1.

Paper
Code

Construction and Applications of Billion-Scale Pre-Trained Multimodal Business Knowledge Graph

1 code implementation • 30 Sep 2022 • Shumin Deng, Chengming Wang, Zhoubo Li, Ningyu Zhang, Zelin Dai, Hehong Chen, Feiyu Xiong, Ming Yan, Qiang Chen, Mosha Chen, Jiaoyan Chen, Jeff Z. Pan, Bryan Hooi, Huajun Chen

We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KG-centric tasks.

Knowledge Graphs

Paper
Code

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

no code implementations • 1 Aug 2022 • Qianglong Chen, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE.

Contrastive Learning Language Modelling +2

Paper
Add Code

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval

1 code implementation • 15 Jul 2022 • Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji

However, cross-grained contrast, which is the contrast between coarse-grained representations and fine-grained representations, has rarely been explored in prior research.

Ranked #12 on Video Retrieval on MSVD

Contrastive Learning Retrieval +2

111

Paper
Code

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

3 code implementations • 24 May 2022 • Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou, Luo Si

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.

Ranked #1 on Image Captioning on COCO Captions

Computational Efficiency Image Captioning +6

6,055

Paper
Code

WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types

3 code implementations • ACL 2022 • Xuwu Wang, Junfeng Tian, Min Gui, Zhixu Li, Rui Wang, Ming Yan, Lihan Chen, Yanghua Xiao

In this paper, we present WikiDiverse, a high-quality human-annotated MEL dataset with diversified contextual topics and entity types from Wikinews, which uses Wikipedia as the corresponding knowledge base.

Entity Linking

Paper
Code

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

1 code implementation • CVPR 2022 • Jiabo Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji Zhang, Liang He, Xin Lin

Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework.

Multimodal Reasoning Visual Grounding

Paper
Code

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

1 code implementation • 27 Jan 2022 • Chen Wu, Ming Yan

Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics.

Code Search Information Retrieval +4

Paper
Code

Achieving Human Parity on Visual Question Answering

no code implementations • 17 Nov 2021 • Ming Yan, Haiyang Xu, Chenliang Li, Junfeng Tian, Bin Bi, Wei Wang, Weihua Chen, Xianzhe Xu, Fan Wang, Zheng Cao, Zhicheng Zhang, Qiyu Zhang, Ji Zhang, Songfang Huang, Fei Huang, Luo Si, Rong Jin

The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.

Ranked #7 on Visual Question Answering (VQA) on VQA v2 test-dev

Question Answering Visual Question Answering

Paper
Add Code

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

no code implementations • 15 Nov 2021 • Zhu Li, Yuqing Zhang, Mengxi Nie, Ming Yan, Mengnan He, Ruixiong Zhang, Caixia Gong

Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech.

Chinese Word Segmentation Multi-Task Learning +5

Paper
Add Code

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

no code implementations • 21 Aug 2021 • Ming Yan, Haiyang Xu, Chenliang Li, Bin Bi, Junfeng Tian, Min Gui, Wei Wang

Existing approaches to vision-language pre-training (VLP) heavily rely on an object detector based on bounding boxes (regions), where salient objects are first detected from images and then a Transformer-based model is used for cross-modal fusion.

Object object-detection +1

Paper
Add Code

Decentralized Composite Optimization with Compression

no code implementations • 10 Aug 2021 • Yao Li, Xiaorui Liu, Jiliang Tang, Ming Yan, Kun Yuan

Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice.

Paper
Add Code

MinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion

no code implementations • SEMEVAL 2021 • Junfeng Tian, Min Gui, Chenliang Li, Ming Yan, Wenming Xiao

We describe our systems of subtask1 and subtask3 for SemEval-2021 Task 6 on Detection of Persuasion Techniques in Texts and Images.

Optical Character Recognition (OCR) Propaganda detection +1

Paper
Add Code

Addressing Semantic Drift in Generative Question Answering with Auxiliary Extraction

no code implementations • ACL 2021 • Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang

This work focuses on generative QA which aims to generate an abstractive answer to a given question instead of extracting an answer span from a provided passage.

Generative Question Answering Machine Reading Comprehension

Paper
Add Code

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

no code implementations • 26 Jul 2021 • Zhuoqing Song, Lei Shi, Shi Pu, Ming Yan

We consider the decentralized optimization problem, where a network of $n$ agents aims to collaboratively minimize the average of their individual smooth and convex objective functions through peer-to-peer communication in a directed graph.

Paper
Add Code

Elastic Graph Neural Networks

1 code implementation • 5 Jul 2021 • Xiaorui Liu, Wei Jin, Yao Ma, Yaxin Li, Hua Liu, Yiqi Wang, Ming Yan, Jiliang Tang

While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing.

Paper
Code

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

no code implementations • 14 Jun 2021 • Zhuoqing Song, Lei Shi, Shi Pu, Ming Yan

The second algorithm is a broadcast-like version of CPP (B-CPP), and it also achieves linear convergence rate under the same conditions on the objective functions.

Paper
Add Code

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning

no code implementations • ACL 2021 • Haiyang Xu, Ming Yan, Chenliang Li, Bin Bi, Songfang Huang, Wenming Xiao, Fei Huang

Vision-language pre-training (VLP) on large-scale image-text pairs has achieved huge success for the cross-modal downstream tasks.

Image Captioning Object +2

Paper
Add Code

StructuralLM: Structural Pre-training for Form Understanding

1 code implementation • ACL 2021 • Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

Document Image Classification Question Answering +1

1,934

Paper
Code

RCT: Resource Constrained Training for Edge AI

no code implementations • 26 Mar 2021 • Tian Huang, Tao Luo, Ming Yan, Joey Tianyi Zhou, Rick Goh

For example, quantisation-aware training (QAT) method involves two copies of model parameters, which is usually beyond the capacity of on-chip memory in edge devices.

Paper
Add Code

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels

no code implementations • 14 Mar 2021 • Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang, Bin Bi, Songfang Huang

Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations.

Paper
Add Code

CoRe: An Efficient Coarse-refined Training Framework for BERT

no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Yuechuan Li, Chao Yang, Ming Yan, Jingqiao Zhang, Fangquan Lin

In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model.

Paper
Add Code

Deep Neural Networks with Short Circuits for Improved Gradient Learning

no code implementations • 23 Sep 2020 • Ming Yan, Xueli Xiao, Joey Tianyi Zhou, Yi Pan

Deep neural networks have achieved great success both in computer vision and natural language processing tasks.

Paper
Add Code

Fast algorithms for robust principal component analysis with an upper bound on the rank

1 code implementation • 18 Aug 2020 • Ningyu Sha, Lei Shi, Ming Yan

The first type of algorithm applies regularization terms on the singular values of a matrix to obtain a low-rank matrix.

Vocal Bursts Type Prediction

Paper
Code

Linear Convergent Decentralized Optimization with Compression

no code implementations • ICLR 2021 • Xiaorui Liu, Yao Li, Rongrong Wang, Jiliang Tang, Ming Yan

Communication compression has become a key strategy to speed up distributed optimization.

Distributed Optimization

Paper
Add Code

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

no code implementations • ACL 2020 • Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou

Multiple-choice question answering (MCQA) is one of the most challenging tasks in machine reading comprehension since it requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations.

Logical Reasoning Machine Reading Comprehension +4

Paper
Add Code

Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm

1 code implementation • 23 Jun 2020 • Xueli Xiao, Ming Yan, Sunitha Basodi, Chunyan Ji, Yi Pan

However, traditional genetic algorithms with fixed-length chromosomes may not be a good fit for optimizing deep learning hyperparameters, because deep learning models have variable number of hyperparameters depending on the model depth.

Hyperparameter Optimization

Paper
Code

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

no code implementations • 15 Jun 2020 • Sulaiman A. Alghunaim, Ming Yan, Ali H. Sayed

This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents.

regression

Paper
Add Code

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

2 code implementations • 14 Apr 2020 • Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

Ranked #1 on Text Generation on CNN/Daily Mail

Abstractive Text Summarization Conversational Response Generation +8

1,934

Paper
Code

Manifold Denoising by Nonlinear Robust Principal Component Analysis

1 code implementation • NeurIPS 2019 • He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, Rongrong Wang

This paper extends robust principal component analysis (RPCA) to nonlinear manifolds.

Denoising Dimensionality Reduction

Paper
Code

A Double Residual Compression Algorithm for Efficient Distributed Learning

no code implementations • 16 Oct 2019 • Xiaorui Liu, Yao Li, Jiliang Tang, Ming Yan

Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms.

Paper
Add Code

Symmetric Regularization based BERT for Pair-wise Semantic Reasoning

1 code implementation • 8 Sep 2019 • Weidi Xu, Xingyi Cheng, Kunlong Chen, Wei Wang, Bin Bi, Ming Yan, Chen Wu, Luo Si, Wei Chu, Taifeng Wang

To remedy this, we propose to augment the NSP task to a 3-class categorization task, which includes a category for previous sentence prediction (PSP).

Machine Reading Comprehension Natural Language Inference +2

Paper
Code

Incorporating External Knowledge into Machine Reading for Generative Question Answering

no code implementations • IJCNLP 2019 • Bin Bi, Chen Wu, Ming Yan, Wei Wang, Jiangnan Xia, Chenliang Li

Different from existing work on knowledge-aware QA, we focus on a more challenging task of leveraging external knowledge to generate answers in natural language for a given question with context.

Answer Generation Generative Question Answering +1

Paper
Add Code

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

no code implementations • ICLR 2020 • Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering.

Ranked #1 on Natural Language Inference on QNLI

Language Modelling Linguistic Acceptability +7

Paper
Add Code

Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning

no code implementations • 13 Aug 2019 • Jiangnan Xia, Chen Wu, Ming Yan

This paper focuses on how to take advantage of external relational knowledge to improve machine reading comprehension (MRC) with multi-task learning.

Language Modelling Machine Reading Comprehension +2

Paper
Add Code

On linear convergence of two decentralized algorithms

no code implementations • 17 Jun 2019 • Yao Li, Ming Yan

In addition, we relax the requirement for the objective functions and the mixing matrices.

Vocal Bursts Valence Prediction

Paper
Add Code

Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering

1 code implementation • ACL 2018 • Wei Wang, Ming Yan, Chen Wu

Extensive experiments on the large-scale SQuAD and TriviaQA datasets validate the effectiveness of the proposed method.

Question Answering Reading Comprehension +1

1,934

Paper
Code

A Deep Cascade Model for Multi-Document Reading Comprehension

no code implementations • 28 Nov 2018 • Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi, Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, Haiqing Chen

To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the document-level and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension.

Ranked #2 on Question Answering on MS MARCO

Machine Reading Comprehension Question Answering +2

Paper
Add Code

$D^2$: Decentralized Training over Decentralized Data

no code implementations • ICML 2018 • Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu

While training a machine learning model using multiple workers, each of which collects data from its own data source, it would be useful when the data collected from different workers are unique and different.

Ranked #3 on Multi-view Subspace Clustering on ORL

Image Classification Multi-view Subspace Clustering

Paper
Add Code

D$^2$: Decentralized Training over Decentralized Data

no code implementations • 19 Mar 2018 • Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}.

Image Classification

Paper
Add Code

Exploring Outliers in Crowdsourced Ranking for QoE

no code implementations • 18 Jul 2017 • Qianqian Xu, Ming Yan, Chendi Huang, Jiechao Xiong, Qingming Huang, Yuan YAO

Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years.

Outlier Detection

Paper
Add Code

Nonconvex penalties with analytical solutions for one-bit compressive sensing

no code implementations • 4 Jun 2017 • Xiaolin Huang, Ming Yan

For several nonconvex penalties, including minimax concave penalty (MCP), $\ell_0$ norm, and sorted $\ell_1$ penalty, we provide fast algorithms for finding the analytical solutions by solving the dual problem.

Compressive Sensing Learning Theory

Paper
Add Code

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

no code implementations • 25 Apr 2017 • Zhi Li, Wei Shi, Ming Yan

This paper proposes a novel proximal-gradient algorithm for a decentralized optimization problem with a composite objective containing smooth and non-smooth terms.

Paper
Add Code

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

no code implementations • 3 Jan 2017 • Xiaolin Huang, Yan Xia, Lei Shi, Yixing Huang, Ming Yan, Joachim Hornegger, Andreas Maier

Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements.

Compressive Sensing Computed Tomography (CT) +1

Paper
Add Code

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

no code implementations • 13 Dec 2016 • Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables.

Paper
Add Code

A new primal-dual algorithm for minimizing the sum of three functions with a linear operator

1 code implementation • 29 Nov 2016 • Ming Yan

For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator.

Paper
Code

Asynchronous Multi-Task Learning

1 code implementation • 30 Sep 2016 • Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou

The models for each hospital may be different because of the inherent differences in the distributions of the patient populations.

Multi-Task Learning

Paper
Code

Coordinate Friendly Structures, Algorithms and Applications

no code implementations • 5 Jan 2016 • Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, Wotao Yin

To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates.

Paper
Add Code

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

1 code implementation • 8 Jun 2015 • Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

The agents share $x$ through either global memory or communication.

Paper
Code

Pinball Loss Minimization for One-bit Compressive Sensing: Convex Models and Algorithms

no code implementations • 14 May 2015 • Xiaolin Huang, Lei Shi, Ming Yan, Johan A. K. Suykens

The one-sided $\ell_1$ loss and the linear loss are two popular loss functions for 1bit-CS.

Compressive Sensing Quantization

Paper
Add Code

A Multiphase Image Segmentation Based on Fuzzy Membership Functions and L1-norm Fidelity

no code implementations • 9 Apr 2015 • Fang Li, Stanley Osher, Jing Qin, Ming Yan

In this paper, we propose a variational multiphase image segmentation model based on fuzzy membership functions and L1-norm fidelity.

Image Segmentation Semantic Segmentation

Paper
Add Code

The Continuity of Images by Transmission Imaging Revisited

no code implementations • 8 Jan 2014 • Zhitao Fan, Feng Guan, Chunlin Wu, Ming Yan

In transmission imaging, it was shown very recently in [49] that almost all images are continuous functions.

Astronomy Medical Diagnosis +1

Paper
Add Code

Restoration of Images Corrupted by Impulse Noise and Mixed Gaussian Impulse Noise using Blind Inpainting

no code implementations • 4 Apr 2013 • Ming Yan

In addition, we provide convergence analysis for these methods, these algorithms will converge to coordinatewise minimum points.

Image Restoration

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.