Search Results for author: Ji Zhang

Found 119 papers, 48 papers with code

Turn-Level User Satisfaction Estimation in E-commerce Customer Service

no code implementations ACL (ECNLP) 2021 Runze Liang, Ryuichi Takanobu, Feng-Lin Li, Ji Zhang, Haiqing Chen, Minlie Huang

To this end, we formalize the turn-level satisfaction estimation as a reinforcement learning problem, in which the model can be optimized with only session-level satisfaction labels.

Incorporating Casual Analysis into Diversified and Logical Response Generation

no code implementations COLING 2022 Jiayi Liu, Wei Wei, Zhixuan Chu, Xing Gao, Ji Zhang, Tan Yan, Yulin kang

Although the Conditional Variational Auto-Encoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question.

Response Generation

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

1 code implementation5 Sep 2024 Anwen Hu, Haiyang Xu, Liang Zhang, Jiabo Ye, Ming Yan, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

Multimodel Large Language Models(MLLMs) have achieved promising OCR-free Document Understanding performance by increasing the supported resolution of document images.

document understanding Optical Character Recognition (OCR) +1

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

no code implementations22 Aug 2024 Chaoya Jiang, Jia Hongrui, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang

This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning.

Language Modelling Large Language Model +1

ProFuser: Progressive Fusion of Large Language Models

no code implementations9 Aug 2024 Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training.

8k

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

1 code implementation9 Aug 2024 Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks.

Language Modelling Large Language Model +1

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

no code implementations21 Jul 2024 Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu

However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images remain underexplored.

In-Context Learning Multiple-choice

Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

no code implementations13 Jun 2024 Yuhao Dan, Junfeng Tian, Jie zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations.

Contrastive Learning Data-to-Text Generation +2

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

2 code implementations3 Jun 2024 Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work.

Neural Dynamic Data Valuation

1 code implementation30 Apr 2024 Zhangyong Liang, Huanhuan Gao, Ji Zhang

Data constitute the foundational component of the data economy and its marketplaces.

Computational Efficiency Data Valuation +1

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

no code implementations21 Mar 2024 Zonghan Yang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts.

Policy Gradient Methods

SocialBench: Sociality Evaluation of Role-Playing Conversational Agents

2 code implementations20 Mar 2024 Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions.

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

1 code implementation19 Mar 2024 Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.

document understanding Optical Character Recognition (OCR)

From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

no code implementations14 Mar 2024 YuHan Liu, Xiuying Chen, Xiaoqing Zhang, Xing Gao, Ji Zhang, Rui Yan

Our simulation results uncover patterns in fake news propagation related to topic relevance, and individual traits, aligning with real-world observations.

OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction

no code implementations8 Mar 2024 Ji Zhang, Yiran Ding, Zixin Liu

To address these issues, we propose OccFusion, a depth estimation free multi-modal fusion framework.

Autonomous Driving Decoder +2

Improving Cross-lingual Representation for Semantic Retrieval with Code-switching

no code implementations3 Mar 2024 Mieradilijiang Maimaiti, Yuanhang Zheng, Ji Zhang, Fei Huang, Yue Zhang, Wenpei Luo, Kaiyu Huang

Semantic Retrieval (SR) has become an indispensable part of the FAQ system in the task-oriented question-answering (QA) dialogue scenario.

Question Answering Retrieval +3

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

no code implementations26 Feb 2024 Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.

Text Retrieval Video-Text Retrieval

Budget-Constrained Tool Learning with Planning

2 code implementations25 Feb 2024 Yuanhang Zheng, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked.

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

no code implementations24 Feb 2024 Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions.

Hallucination Hallucination Evaluation

Model Composition for Multimodal Large Language Models

1 code implementation20 Feb 2024 Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.

PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs

1 code implementation20 Feb 2024 An Liu, Zonghan Yang, Zhenhe Zhang, Qingyuan Hu, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models.

text-classification Text Classification

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

1 code implementation19 Feb 2024 Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.

Enabling Weak LLMs to Judge Response Reliability via Meta Ranking

no code implementations19 Feb 2024 Zijun Liu, Boqun Kou, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

In model cascading, we combine open- and closed-source LLMs to achieve performance comparable to GPT-4-turbo with lower costs.

Hallucination In-Context Learning

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

1 code implementation29 Jan 2024 Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

To assess the performance of Mobile-Agent, we introduced Mobile-Eval, a benchmark for evaluating mobile device operations.

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

1 code implementation14 Jan 2024 Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang

Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task.

Language Modelling Large Language Model

Knowledge Distillation for Closed-Source Language Models

no code implementations13 Jan 2024 Hongzhan Chen, Xiaojun Quan, Hehong Chen, Ming Yan, Ji Zhang

The prior estimation aims to derive a prior distribution by utilizing the corpus generated by closed-source language models, while the posterior estimation employs a proxy model to update the prior distribution and derive a posterior distribution.

Knowledge Distillation

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

1 code implementation14 Dec 2023 Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities.

Contrastive Learning Data Augmentation

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

1 code implementation CVPR 2024 Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them.

Contrastive Learning Hallucination +5

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

1 code implementation30 Nov 2023 Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.

Language Modelling Large Language Model +1

Class Gradient Projection For Continual Learning

1 code implementation25 Nov 2023 Cheng Chen, Ji Zhang, Jingkuan Song, Lianli Gao

Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL).

Continual Learning Contrastive Learning

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

1 code implementation13 Nov 2023 Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Attribute Hallucination +2

CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment

no code implementations25 Oct 2023 Jixiang Hong, Quan Tu, Changyu Chen, Xing Gao, Ji Zhang, Rui Yan

With in-context learning (ICL) as the core of the cycle, the black-box models are able to rank the model-generated responses guided by human-craft instruction and demonstrations about their preferences.

In-Context Learning Instruction Following +2

MCC-KD: Multi-CoT Consistent Knowledge Distillation

1 code implementation23 Oct 2023 Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting.

Diversity Knowledge Distillation +1

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions

1 code implementation23 Oct 2023 Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang

In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token.

Decoder Grammatical Error Correction +1

DePT: Decoupled Prompt Tuning

1 code implementation CVPR 2024 Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song

Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.

Prompt Engineering Zero-shot Generalization

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

1 code implementation2 Sep 2023 Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior.

Evaluation and Analysis of Hallucination in Large Vision-Language Models

1 code implementation29 Aug 2023 Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Hallucination Hallucination Evaluation

From Global to Local: Multi-scale Out-of-distribution Detection

1 code implementation20 Aug 2023 Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, HengTao Shen

Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +1

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

no code implementations16 Aug 2023 Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li

Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.

Autonomous Driving Contrastive Learning

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

1 code implementation19 Jul 2023 Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou, Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang, Jingren Zhou

In this paper, we present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs in terms of both safety and responsibility criteria.

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

1 code implementation4 Jul 2023 Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding.

document understanding Language Modelling +3

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

no code implementations29 Jun 2023 Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan

In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts.

Data Augmentation Dialogue Generation +2

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

1 code implementation7 Jun 2023 Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.

Cross-Modal Retrieval Language Modelling +4

Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering

no code implementations14 May 2023 Qianglong Chen, Guohai Xu, Ming Yan, Ji Zhang, Fei Huang, Luo Si, Yin Zhang

Existing knowledge-enhanced methods have achieved remarkable results in certain QA tasks via obtaining diverse knowledge from different knowledge bases.

Explanation Generation Question Answering

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference

no code implementations13 May 2023 Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student.

Knowledge Distillation

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds

no code implementations25 Apr 2023 Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma

In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation.

Autonomous Driving Contrastive Learning +2

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

1 code implementation16 Apr 2023 Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.

World Knowledge

DETA: Denoised Task Adaptation for Few-Shot Learning

2 code implementations ICCV 2023 Ji Zhang, Lianli Gao, Xu Luo, HengTao Shen, Jingkuan Song

Test-time task adaptation in few-shot learning aims to adapt a pre-trained task-agnostic model for capturing taskspecific knowledge of the test task, rely only on few-labeled support samples.

Denoising Few-Shot Learning

Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance

1 code implementation28 Feb 2023 Xueyi Liu, Ji Zhang, Ruizhen Hu, Haibin Huang, He Wang, Li Yi

Category-level articulated object pose estimation aims to estimate a hierarchy of articulation-aware object poses of an unseen articulated object from a known category.

Disentanglement Object +1

Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits

no code implementations24 Feb 2023 Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held

To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move.

Multi-Armed Bandits Navigate +1

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

4 code implementations1 Feb 2023 Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

Action Classification Image Classification +8

A Closer Look at Few-shot Classification Again

3 code implementations28 Jan 2023 Xu Luo, Hao Wu, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song

Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples.

Classification Representation Learning +1

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

no code implementations ICCV 2023 Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang

We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.

cross-modal alignment TGIF-Action +8

Intelligent Computing: The Latest Advances, Challenges and Future

no code implementations21 Nov 2022 Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan

In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.

Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment

no code implementations14 Nov 2022 Junyang Wang, Yi Zhang, Ming Yan, Ji Zhang, Jitao Sang

We further propose Anchor Augment to guide the generative model's attention to the fine-grained information in the representation of CLIP.

Computational Efficiency Image Captioning +2

MUI-TARE: Multi-Agent Cooperative Exploration with Unknown Initial Position

no code implementations22 Sep 2022 Jingtian Yan, Xingqiao Lin, Zhongqiang Ren, Shiqi Zhao, Jieqiong Yu, Chao Cao, Peng Yin, Ji Zhang, Sebastian Scherer

To intelligently balance the robustness of sub-map merging and exploration efficiency, we develop a new approach for lidar-based multi-agent exploration, which can direct one agent to repeat another agent's trajectory in an \emph{adaptive} manner based on the quality indicator of the sub-map merging process.

Position

Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce

no code implementations20 Sep 2022 Bo Chen, Jiayi Liu, Mieradilijiang Maimaiti, Xing Gao, Ji Zhang

A multi-aspect attentive network is proposed to automatically attend to different aspects in a review and ensure most of the issues are tackled.

Response Generation

Incorporating Causal Analysis into Diversified and Logical Response Generation

no code implementations20 Sep 2022 Jiayi Liu, Wei Wei, Zhixuan Chu, Xing Gao, Ji Zhang, Tan Yan, Yulin kang

Although the Conditional Variational AutoEncoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question.

Response Generation

iSimLoc: Visual Global Localization for Previously Unseen Environments with Simulated Images

no code implementations14 Sep 2022 Peng Yin, Ivan Cisneros, Ji Zhang, Howie Choset, Sebastian Scherer

The visual camera is an attractive device in beyond visual line of sight (B-VLOS) drone operation, since they are low in size, weight, power, and cost, and can provide redundant modality to GPS failures.

Retrieval Visual Localization

Class-Level Logit Perturbation

1 code implementation13 Sep 2022 Mengyang Li, Fengguang Su, Ou wu, Ji Zhang

However, limited studies have explicitly explored for the perturbation of logit vectors.

Data Augmentation Image Classification +1

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

no code implementations1 Aug 2022 Qianglong Chen, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE.

Contrastive Learning Language Modelling +2

RCA: Ride Comfort-Aware Visual Navigation via Self-Supervised Learning

no code implementations29 Jul 2022 Xinjie Yao, Ji Zhang, Jean Oh

Under shared autonomy, wheelchair users expect vehicles to provide safe and comfortable rides while following users high-level navigation plans.

Self-Supervised Learning Visual Navigation

Scene Recognition with Objectness, Attribute and Category Learning

no code implementations20 Jul 2022 Ji Zhang, Jean-Paul Ainam, Li-hui Zhao, Wenai Song, Xin Wang

Based on the complementarity of attribute and category labels, we propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a category embedding and at the same time predicts scene attributes.

Attribute Scene Classification +1

ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization

1 code implementation19 Jul 2022 Ivan Cisneros, Peng Yin, Ji Zhang, Howie Choset, Sebastian Scherer

We present the ALTO dataset, a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles.

Benchmarking Image Registration +2

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval

1 code implementation15 Jul 2022 Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji

However, cross-grained contrast, which is the contrast between coarse-grained representations and fine-grained representations, has rarely been explored in prior research.

Contrastive Learning Text Retrieval +1

AutoMerge: A Framework for Map Assembling and Smoothing in City-scale Environments

no code implementations14 Jul 2022 Peng Yin, Haowen Lai, Shiqi Zhao, Ruohai Ge, Ji Zhang, Howie Choset, Sebastian Scherer

We present AutoMerge, a LiDAR data processing framework for assembling a large number of map segments into a complete map.

Loop Closure Detection Retrieval

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

3 code implementations24 May 2022 Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou, Luo Si

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.

Computational Efficiency cross-modal alignment +7

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

1 code implementation CVPR 2022 Jiabo Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji Zhang, Liang He, Xin Lin

Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework.

Multimodal Reasoning Visual Grounding

LQoCo: Learning to Optimize Cache Capacity Overloading in Storage Systems

no code implementations21 Mar 2022 Ji Zhang, Xijun Li, Xiyao Zhou, Mingxuan Yuan, Zhuo Cheng, Keji Huang, YiFan Li

Cache plays an important role to maintain high and stable performance (i. e. high throughput, low tail latency and throughput jitter) in storage systems.

Management

Deep Multi-Branch Aggregation Network for Real-Time Semantic Segmentation in Street Scenes

no code implementations8 Mar 2022 Xi Weng, Yan Yan, Genshun Dong, Chang Shu, Biao Wang, Hanzi Wang, Ji Zhang

This shows that DMA-Net provides a good tradeoff between segmentation quality and speed for semantic segmentation in street scenes.

Decoder Real-Time Semantic Segmentation +1

A Generic Knowledge Based Medical Diagnosis Expert System

no code implementations9 Oct 2021 Xin Huang, Xuejiao Tang, Wenbin Zhang, Shichao Pei, Ji Zhang, Mingli Zhang, Zhen Liu, Ruijun Chen, Yiyi Huang

The proposed disease diagnosis system also uses a graphical user interface (GUI) to facilitate users to interact with the expert system.

Medical Diagnosis

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

no code implementations22 Sep 2021 Fu Sun, Feng-Lin Li, Ruize Wang, Qianglong Chen, Xingyi Cheng, Ji Zhang

Knowledge enhanced pre-trained language models (K-PLMs) are shown to be effective for many public tasks in the literature but few of them have been successfully applied in practice.

Knowledge Distillation Question Answering +4

GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

no code implementations18 Aug 2021 Xuming Lin, Shaobo Cui, Zhongzhou Zhao, Wei Zhou, Ji Zhang, Haiqing Chen

With these two synergic representations, we then regroup these phrases into a fine-grained plan, based on which we generate the final long text.

Story Generation

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

1 code implementation16 Aug 2021 Yuhao Cui, Zhou Yu, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu

Nevertheless, most existing VLP approaches have not fully utilized the intrinsic knowledge within the image-text pairs, which limits the effectiveness of the learned alignments and further restricts the performance of their models.

Visual Reasoning

KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference

no code implementations ACL 2021 Qianglong Chen, Feng Ji, Xiangji Zeng, Feng-Lin Li, Ji Zhang, Haiqing Chen, Yin Zhang

In order to better understand the reason behind model behaviors (i. e., making predictions), most recent works have exploited generative models to provide complementary explanations.

counterfactual Language Modelling +1

Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss

no code implementations CVPR 2021 Lu Zhang, Shuigeng Zhou, Jihong Guan, Ji Zhang

Most object detection methods require huge amounts of annotated data and can detect only the categories that appear in the training set.

Few-Shot Object Detection object-detection

i3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent Environmental Conditions

no code implementations27 May 2021 Peng Yin, Lingyun Xu, Ji Zhang, Howie Choset, Sebastian Scherer

Based on such features, we further design a spherical convolution network to learn viewpoint-invariant symmetric place descriptors.

3D geometry Generative Adversarial Network +1

State-Promoted Investment for Industrial Reforms: an Information Design Approach

no code implementations20 May 2021 Keeyoung Rhee, Myungkyu Shim, Ji Zhang

We analyze the optimal strategy for a government to promote large-scale investment projects under information frictions.

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

1 code implementation5 May 2021 Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto del Bimbo

Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.

Question Answering Visual Question Answering

LSTM Based Sentiment Analysis for Cryptocurrency Prediction

no code implementations27 Mar 2021 Xin Huang, Wenbin Zhang, Xuejiao Tang, Mingli Zhang, Jayachander Surbiryala, Vasileios Iosifidis, Zhen Liu, Ji Zhang

Recent studies in big data analytics and natural language processing develop automatic techniques in analyzing sentiment in the social media information.

Sentiment Analysis

OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop Approach

no code implementations24 Feb 2021 Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji Zhang, Haiqing Chen

This pipeline approach, however, is undesired in mining the most appropriate QA pairs from documents since it ignores the connection between question generation and answer extraction, which may lead to incompatible QA pair generation, i. e., the selected answer span is inappropriate for question generation.

Machine Reading Comprehension Question Answering +2

A Data-driven Human Responsibility Management System

no code implementations6 Dec 2020 Xuejiao Tang, Jiong Qiu, Ruijun Chen, Wenbin Zhang, Vasileios Iosifidis, Zhen Liu, Wei Meng, Mingli Zhang, Ji Zhang

An ideal safe workplace is described as a place where staffs fulfill responsibilities in a well-organized order, potential hazardous events are being monitored in real-time, as well as the number of accidents and relevant damages are minimized.

Management

Distant Supervision for E-commerce Query Segmentation via Attention Network

no code implementations9 Nov 2020 Zhao Li, Donghui Ding, Pengcheng Zou, Yu Gong, Xi Chen, Ji Zhang, Jianliang Gao, Youxi Wu, Yucong Duan

The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers.

Segmentation

AI Marker-based Large-scale AI Literature Mining

no code implementations1 Nov 2020 Rujing Yao, Yingchun Ye, Ji Zhang, Shuxiao Li, Ou wu

Inspired by the idea of molecular markers tracing in the field of biochemistry, three named entities, namely, methods, datasets and metrics are used as AI markers for AI literature.

Clustering Literature Mining +1

Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention

no code implementations26 Oct 2020 Linlin Hou, Ji Zhang, Ou wu, Ting Yu, Zhen Wang, Zhao Li, Jianliang Gao, Yingchun Ye, Rujing Yao

We finally apply our model on PAKDD papers published from 2009-2019 to mine insightful results from scientific papers published in a longer time span.

Data Augmentation

AliMe KG: Domain Knowledge Graph Construction and Application in E-commerce

no code implementations24 Sep 2020 Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, Haiqing Chen

Pre-sales customer service is of importance to E-commerce platforms as it contributes to optimizing customers' buying process.

graph construction Question Answering

Character Matters: Video Story Understanding with Character-Aware Relations

no code implementations9 May 2020 Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.

Question Answering

Target-Guided Structured Attention Network for Target-Dependent Sentiment Analysis

no code implementations TACL 2020 Ji Zhang, Chengyao Chen, PengFei Liu, Chao He, Cane Wing-Ki Leung

Second, it shows a strong advantage in determining the sentiment of a target when the context sentence contains multiple semantic segments.

Sentence Sentiment Analysis +1

Method and Dataset Mining in Scientific Papers

no code implementations29 Nov 2019 Rujing Yao, Linlin Hou, Yingchun Ye, Ou wu, Ji Zhang, Jian Wu

In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers.

Following Social Groups: Socially Compliant Autonomous Navigation in Dense Crowds

no code implementations27 Nov 2019 Xinjie Yao, Ji Zhang, Jean Oh

The underlying system incorporates a deep neural network to track social groups and join the flow of a social group in facilitating the navigation.

Autonomous Navigation Collision Avoidance +1

2nd Place Solution to the GQA Challenge 2019

no code implementations16 Jul 2019 Shijie Geng, Ji Zhang, Hang Zhang, Ahmed Elgammal, Dimitris N. Metaxas

We present a simple method that achieves unexpectedly superior performance for Complex Reasoning involved Visual Question Answering.

Question Answering Visual Question Answering +1

Graphical Contrastive Losses for Scene Graph Parsing

3 code implementations CVPR 2019 Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Relationship Detection Scene Graph Generation +1

A Deep Cascade Model for Multi-Document Reading Comprehension

no code implementations28 Nov 2018 Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi, Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, Haiqing Chen

To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the document-level and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension.

Machine Reading Comprehension Question Answering +2

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

no code implementations1 Nov 2018 Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.

Relationship Detection Visual Relationship Detection

Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages

no code implementations20 Oct 2018 Xin Tang, Shanbo Cheng, Loc Do, Zhiyu Min, Feng Ji, Heng Yu, Ji Zhang, Haiqin Chen

Our approach is extended from a basic monolingual STS framework to a shared multilingual encoder pretrained with translation task to incorporate rich-resource language data.

Machine Translation Semantic Similarity +4

Large-Scale Visual Relationship Understanding

2 code implementations27 Apr 2018 Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.

Relationship Detection

Relationship Proposal Networks

no code implementations CVPR 2017 Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal

We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.

Scene Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.