no code implementations • ACL (ECNLP) 2021 • Runze Liang, Ryuichi Takanobu, Feng-Lin Li, Ji Zhang, Haiqing Chen, Minlie Huang
To this end, we formalize the turn-level satisfaction estimation as a reinforcement learning problem, in which the model can be optimized with only session-level satisfaction labels.
no code implementations • COLING 2022 • Guodun Li, Yuchen Zhai, Qianglong Chen, Xing Gao, Ji Zhang, Yin Zhang
Intent detection is at the core of task-oriented dialogue systems.
no code implementations • EMNLP 2021 • Mieradilijiang Maimaiti, Yang Liu, Yuanhang Zheng, Gang Chen, Kaiyu Huang, Ji Zhang, Huanbo Luan, Maosong Sun
Besides, the robustness of the previous neural methods is limited by the large-scale annotated data.
no code implementations • COLING 2022 • Jiayi Liu, Wei Wei, Zhixuan Chu, Xing Gao, Ji Zhang, Tan Yan, Yulin kang
Although the Conditional Variational Auto-Encoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question.
1 code implementation • 5 Sep 2024 • Anwen Hu, Haiyang Xu, Liang Zhang, Jiabo Ye, Ming Yan, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou
Multimodel Large Language Models(MLLMs) have achieved promising OCR-free Document Understanding performance by increasing the supported resolution of document images.
document understanding Optical Character Recognition (OCR) +1
no code implementations • 22 Aug 2024 • Chaoya Jiang, Jia Hongrui, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning.
no code implementations • 9 Aug 2024 • Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang
While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training.
1 code implementation • 9 Aug 2024 • Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks.
Ranked #5 on Video Question Answering on NExT-QA
no code implementations • 21 Jul 2024 • Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu
However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images remain underexplored.
no code implementations • 13 Jun 2024 • Yuhao Dan, Junfeng Tian, Jie zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He
Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations.
no code implementations • 12 Jun 2024 • Jun Bai, Di wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, Ji Zhang
Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products.
2 code implementations • 3 Jun 2024 • Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work.
1 code implementation • 30 Apr 2024 • Zhangyong Liang, Huanhuan Gao, Ji Zhang
Data constitute the foundational component of the data economy and its marketplaces.
1 code implementation • 25 Apr 2024 • Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang
Charts are important for presenting and explaining complex data relationships.
no code implementations • 21 Mar 2024 • Zonghan Yang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts.
2 code implementations • 20 Mar 2024 • Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, Jingren Zhou
In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions.
1 code implementation • 19 Mar 2024 • Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou
In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.
no code implementations • 14 Mar 2024 • YuHan Liu, Xiuying Chen, Xiaoqing Zhang, Xing Gao, Ji Zhang, Rui Yan
Our simulation results uncover patterns in fake news propagation related to topic relevance, and individual traits, aligning with real-world observations.
no code implementations • 8 Mar 2024 • Ji Zhang, Yiran Ding, Zixin Liu
To address these issues, we propose OccFusion, a depth estimation free multi-modal fusion framework.
no code implementations • 3 Mar 2024 • Mieradilijiang Maimaiti, Yuanhang Zheng, Ji Zhang, Fei Huang, Yue Zhang, Wenpei Luo, Kaiyu Huang
Semantic Retrieval (SR) has become an indispensable part of the FAQ system in the task-oriented question-answering (QA) dialogue scenario.
no code implementations • 1 Mar 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.
no code implementations • 26 Feb 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.
2 code implementations • 25 Feb 2024 • Yuanhang Zheng, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked.
no code implementations • 24 Feb 2024 • Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang
Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions.
1 code implementation • 20 Feb 2024 • Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu
In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
1 code implementation • 20 Feb 2024 • An Liu, Zonghan Yang, Zhenhe Zhang, Qingyuan Hu, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models.
1 code implementation • 19 Feb 2024 • Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu
With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.
no code implementations • 19 Feb 2024 • Zijun Liu, Boqun Kou, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
In model cascading, we combine open- and closed-source LLMs to achieve performance comparable to GPT-4-turbo with lower costs.
1 code implementation • 29 Jan 2024 • Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
To assess the performance of Mobile-Agent, we introduced Mobile-Eval, a benchmark for evaluating mobile device operations.
1 code implementation • 14 Jan 2024 • Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang
Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task.
no code implementations • 13 Jan 2024 • Hongzhan Chen, Xiaojun Quan, Hehong Chen, Ming Yan, Ji Zhang
The prior estimation aims to derive a prior distribution by utilizing the corpus generated by closed-source language models, while the posterior estimation employs a proxy model to update the prior distribution and derive a posterior distribution.
1 code implementation • 14 Dec 2023 • Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang
Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities.
1 code implementation • CVPR 2024 • Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang
We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them.
Ranked #116 on Visual Question Answering on MM-Vet
1 code implementation • 30 Nov 2023 • Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang
In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.
1 code implementation • 25 Nov 2023 • Cheng Chen, Ji Zhang, Jingkuan Song, Lianli Gao
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL).
1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.
2 code implementations • CVPR 2024 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.
Ranked #10 on Long-Context Understanding on MMNeedle
no code implementations • 25 Oct 2023 • Jixiang Hong, Quan Tu, Changyu Chen, Xing Gao, Ji Zhang, Rui Yan
With in-context learning (ICL) as the core of the cycle, the black-box models are able to rank the model-generated responses guided by human-craft instruction and demonstrations about their preferences.
1 code implementation • 23 Oct 2023 • Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang
Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting.
1 code implementation • 23 Oct 2023 • Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang
In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token.
Ranked #1 on Grammatical Error Correction on MuCGEC
1 code implementation • 8 Oct 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang
Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs.
1 code implementation • CVPR 2024 • Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song
Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.
Ranked #2 on Prompt Engineering on Stanford Cars
1 code implementation • 2 Sep 2023 • Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou
Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior.
1 code implementation • 29 Aug 2023 • Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang
In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.
1 code implementation • 20 Aug 2023 • Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, HengTao Shen
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Out-of-Distribution Detection Out of Distribution (OOD) Detection +1
no code implementations • 16 Aug 2023 • Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
1 code implementation • 19 Jul 2023 • Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou, Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang, Jingren Zhou
In this paper, we present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs in terms of both safety and responsibility criteria.
1 code implementation • 4 Jul 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang
Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding.
no code implementations • 29 Jun 2023 • Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan
In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts.
1 code implementation • 7 Jun 2023 • Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang
In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.
no code implementations • 14 May 2023 • Qianglong Chen, Guohai Xu, Ming Yan, Ji Zhang, Fei Huang, Luo Si, Yin Zhang
Existing knowledge-enhanced methods have achieved remarkable results in certain QA tasks via obtaining diverse knowledge from different knowledge bases.
no code implementations • 13 May 2023 • Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang
To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student.
1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.
Ranked #1 on Visual Question Answering (VQA) on HallusionBench (Question Pair Acc metric)
Visual Question Answering (VQA) Zero-Shot Video Question Answer
no code implementations • 25 Apr 2023 • Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation.
1 code implementation • 16 Apr 2023 • Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou
In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.
2 code implementations • ICCV 2023 • Ji Zhang, Lianli Gao, Xu Luo, HengTao Shen, Jingkuan Song
Test-time task adaptation in few-shot learning aims to adapt a pre-trained task-agnostic model for capturing taskspecific knowledge of the test task, rely only on few-labeled support samples.
1 code implementation • 28 Feb 2023 • Xueyi Liu, Ji Zhang, Ruizhen Hu, Haibin Huang, He Wang, Li Yi
Category-level articulated object pose estimation aims to estimate a hierarchy of articulation-aware object poses of an unseen articulated object from a known category.
no code implementations • 24 Feb 2023 • Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held
To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move.
4 code implementations • 1 Feb 2023 • Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou
In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.
Ranked #1 on Video Captioning on MSR-VTT
3 code implementations • 28 Jan 2023 • Xu Luo, Hao Wu, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song
Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples.
no code implementations • ICCV 2023 • Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang
We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.
Ranked #1 on Visual Question Answering (VQA) on TGIF-QA
no code implementations • 21 Nov 2022 • Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan
In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.
no code implementations • 14 Nov 2022 • Junyang Wang, Yi Zhang, Ming Yan, Ji Zhang, Jitao Sang
We further propose Anchor Augment to guide the generative model's attention to the fine-grained information in the representation of CLIP.
no code implementations • 22 Sep 2022 • Jingtian Yan, Xingqiao Lin, Zhongqiang Ren, Shiqi Zhao, Jieqiong Yu, Chao Cao, Peng Yin, Ji Zhang, Sebastian Scherer
To intelligently balance the robustness of sub-map merging and exploration efficiency, we develop a new approach for lidar-based multi-agent exploration, which can direct one agent to repeat another agent's trajectory in an \emph{adaptive} manner based on the quality indicator of the sub-map merging process.
no code implementations • 20 Sep 2022 • Bo Chen, Jiayi Liu, Mieradilijiang Maimaiti, Xing Gao, Ji Zhang
A multi-aspect attentive network is proposed to automatically attend to different aspects in a review and ensure most of the issues are tackled.
no code implementations • 20 Sep 2022 • Jiayi Liu, Wei Wei, Zhixuan Chu, Xing Gao, Ji Zhang, Tan Yan, Yulin kang
Although the Conditional Variational AutoEncoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question.
no code implementations • 14 Sep 2022 • Peng Yin, Ivan Cisneros, Ji Zhang, Howie Choset, Sebastian Scherer
The visual camera is an attractive device in beyond visual line of sight (B-VLOS) drone operation, since they are low in size, weight, power, and cost, and can provide redundant modality to GPS failures.
1 code implementation • 13 Sep 2022 • Mengyang Li, Fengguang Su, Ou wu, Ji Zhang
However, limited studies have explicitly explored for the perturbation of logit vectors.
no code implementations • 1 Aug 2022 • Qianglong Chen, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang
We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE.
no code implementations • 29 Jul 2022 • Xinjie Yao, Ji Zhang, Jean Oh
Under shared autonomy, wheelchair users expect vehicles to provide safe and comfortable rides while following users high-level navigation plans.
no code implementations • 20 Jul 2022 • Ji Zhang, Jean-Paul Ainam, Li-hui Zhao, Wenai Song, Xin Wang
Based on the complementarity of attribute and category labels, we propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a category embedding and at the same time predicts scene attributes.
1 code implementation • 19 Jul 2022 • Ivan Cisneros, Peng Yin, Ji Zhang, Howie Choset, Sebastian Scherer
We present the ALTO dataset, a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles.
1 code implementation • 15 Jul 2022 • Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji
However, cross-grained contrast, which is the contrast between coarse-grained representations and fine-grained representations, has rarely been explored in prior research.
Ranked #12 on Video Retrieval on MSVD
no code implementations • 14 Jul 2022 • Peng Yin, Haowen Lai, Shiqi Zhao, Ruohai Ge, Ji Zhang, Howie Choset, Sebastian Scherer
We present AutoMerge, a LiDAR data processing framework for assembling a large number of map segments into a complete map.
no code implementations • 11 Jul 2022 • Jie Qin, Shuaihang Yuan, Jiaxin Chen, Boulbaba Ben Amor, Yi Fang, Nhat Hoang-Xuan, Chi-Bien Chu, Khoi-Nguyen Nguyen-Ngoc, Thien-Tri Cao, Nhat-Khang Ngo, Tuan-Luc Huynh, Hai-Dang Nguyen, Minh-Triet Tran, Haoyang Luo, Jianning Wang, Zheng Zhang, Zihao Xin, Yang Wang, Feng Wang, Ying Tang, Haiqin Chen, Yan Wang, Qunying Zhou, Ji Zhang, Hongyuan Wang
We define two SBSR tasks and construct two benchmarks consisting of more than 46, 000 CAD models, 1, 700 realistic models, and 145, 000 sketches in total.
3 code implementations • 24 May 2022 • Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou, Luo Si
Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks.
Ranked #1 on Image Captioning on COCO Captions
no code implementations • NAACL 2022 • Jianhai Zhang, Mieradilijiang Maimaiti, Xing Gao, Yuanhang Zheng, Ji Zhang
They also ignore the importance to capture the inter-dependency between query and the support set for few-shot text classification.
no code implementations • 30 Mar 2022 • Wenshen Xu, Mieradilijiang Maimaiti, Yuanhang Zheng, Xin Tang, Ji Zhang
Unexpectedly, MLM ignores the sentence-level training, and CL also neglects extraction of the internal info from the query.
1 code implementation • CVPR 2022 • Jiabo Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji Zhang, Liang He, Xin Lin
Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework.
no code implementations • 21 Mar 2022 • Ji Zhang, Xijun Li, Xiyao Zhou, Mingxuan Yuan, Zhuo Cheng, Keji Huang, YiFan Li
Cache plays an important role to maintain high and stable performance (i. e. high throughput, low tail latency and throughput jitter) in storage systems.
no code implementations • 8 Mar 2022 • Xi Weng, Yan Yan, Genshun Dong, Chang Shu, Biao Wang, Hanzi Wang, Ji Zhang
This shows that DMA-Net provides a good tradeoff between segmentation quality and speed for semantic segmentation in street scenes.
no code implementations • 17 Nov 2021 • Ming Yan, Haiyang Xu, Chenliang Li, Junfeng Tian, Bin Bi, Wei Wang, Weihua Chen, Xianzhe Xu, Fan Wang, Zheng Cao, Zhicheng Zhang, Qiyu Zhang, Ji Zhang, Songfang Huang, Fei Huang, Luo Si, Rong Jin
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
Ranked #8 on Visual Question Answering (VQA) on VQA v2 test-dev
no code implementations • 9 Oct 2021 • Xin Huang, Xuejiao Tang, Wenbin Zhang, Shichao Pei, Ji Zhang, Mingli Zhang, Zhen Liu, Ruijun Chen, Yiyi Huang
The proposed disease diagnosis system also uses a graphical user interface (GUI) to facilitate users to interact with the expert system.
no code implementations • 22 Sep 2021 • Fu Sun, Feng-Lin Li, Ruize Wang, Qianglong Chen, Xingyi Cheng, Ji Zhang
Knowledge enhanced pre-trained language models (K-PLMs) are shown to be effective for many public tasks in the literature but few of them have been successfully applied in practice.
no code implementations • 13 Sep 2021 • Guohai Xu, Hehong Chen, Feng-Lin Li, Fu Sun, Yunzhou Shi, Zhixiong Zeng, Wei Zhou, Zhongzhou Zhao, Ji Zhang
Live streaming is becoming an increasingly popular trend of sales in E-commerce.
no code implementations • 18 Aug 2021 • Xuming Lin, Shaobo Cui, Zhongzhou Zhao, Wei Zhou, Ji Zhang, Haiqing Chen
With these two synergic representations, we then regroup these phrases into a fine-grained plan, based on which we generate the final long text.
no code implementations • 17 Aug 2021 • Shaobo Cui, Xintong Bao, Xuming Lin, Zhongzhou Zhao, Ji Zhang, Wei Zhou, Haiqing Chen
Each one-to-one mapping is associated with a conditional generation pattern and is modeled with an expert in SPMoE.
1 code implementation • 16 Aug 2021 • Yuhao Cui, Zhou Yu, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu
Nevertheless, most existing VLP approaches have not fully utilized the intrinsic knowledge within the image-text pairs, which limits the effectiveness of the learned alignments and further restricts the performance of their models.
no code implementations • ACL 2021 • Qianglong Chen, Feng Ji, Xiangji Zeng, Feng-Lin Li, Ji Zhang, Haiqing Chen, Yin Zhang
In order to better understand the reason behind model behaviors (i. e., making predictions), most recent works have exploited generative models to provide complementary explanations.
1 code implementation • 4 Jul 2021 • Xuejiao Tang, Xin Huang, Wenbin Zhang, Travers B. Child, Qiong Hu, Zhen Liu, Ji Zhang
Moreover, the proposed model provides intuitive interpretation into visual commonsense reasoning.
no code implementations • CVPR 2021 • Lu Zhang, Shuigeng Zhou, Jihong Guan, Ji Zhang
Most object detection methods require huge amounts of annotated data and can detect only the categories that appear in the training set.
no code implementations • 27 May 2021 • Peng Yin, Lingyun Xu, Ji Zhang, Howie Choset, Sebastian Scherer
Based on such features, we further design a spherical convolution network to learn viewpoint-invariant symmetric place descriptors.
no code implementations • 20 May 2021 • Keeyoung Rhee, Myungkyu Shim, Ji Zhang
We analyze the optimal strategy for a government to promote large-scale investment projects under information frictions.
1 code implementation • 5 May 2021 • Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto del Bimbo
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.
no code implementations • 27 Mar 2021 • Xin Huang, Wenbin Zhang, Xuejiao Tang, Mingli Zhang, Jayachander Surbiryala, Vasileios Iosifidis, Zhen Liu, Ji Zhang
Recent studies in big data analytics and natural language processing develop automatic techniques in analyzing sentiment in the social media information.
no code implementations • 24 Feb 2021 • Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji Zhang, Haiqing Chen
This pipeline approach, however, is undesired in mining the most appropriate QA pairs from documents since it ignores the connection between question generation and answer extraction, which may lead to incompatible QA pair generation, i. e., the selected answer span is inappropriate for question generation.
no code implementations • 6 Dec 2020 • Xuejiao Tang, Jiong Qiu, Ruijun Chen, Wenbin Zhang, Vasileios Iosifidis, Zhen Liu, Wei Meng, Mingli Zhang, Ji Zhang
An ideal safe workplace is described as a place where staffs fulfill responsibilities in a well-organized order, potential hazardous events are being monitored in real-time, as well as the number of accidents and relevant damages are minimized.
no code implementations • 6 Dec 2020 • Xuejiao Tang, Liuhua Zhang, Wenbin Zhang, Xin Huang, Vasileios Iosifidis, Zhen Liu, Mingli Zhang, Enza Messina, Ji Zhang
Early detection of breast cancer in X-ray mammography is believed to have effectively reduced the mortality rate.
no code implementations • 9 Nov 2020 • Zhao Li, Donghui Ding, Pengcheng Zou, Yu Gong, Xi Chen, Ji Zhang, Jianliang Gao, Youxi Wu, Yucong Duan
The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers.
no code implementations • 1 Nov 2020 • Rujing Yao, Yingchun Ye, Ji Zhang, Shuxiao Li, Ou wu
Inspired by the idea of molecular markers tracing in the field of biochemistry, three named entities, namely, methods, datasets and metrics are used as AI markers for AI literature.
no code implementations • 26 Oct 2020 • Linlin Hou, Ji Zhang, Ou wu, Ting Yu, Zhen Wang, Zhao Li, Jianliang Gao, Yingchun Ye, Rujing Yao
We finally apply our model on PAKDD papers published from 2009-2019 to mine insightful results from scientific papers published in a longer time span.
no code implementations • 24 Sep 2020 • Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, Haiqing Chen
Pre-sales customer service is of importance to E-commerce platforms as it contributes to optimizing customers' buying process.
no code implementations • 9 May 2020 • Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo
Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.
1 code implementation • 1 Apr 2020 • Huai Yu, Weikun Zhen, Wen Yang, Ji Zhang, Sebastian Scherer
With the pose prediction from VIO, we can efficiently obtain coarse 2D-3D line correspondences.
no code implementations • TACL 2020 • Ji Zhang, Chengyao Chen, PengFei Liu, Chao He, Cane Wing-Ki Leung
Second, it shows a strong advantage in determining the sentiment of a target when the context sentence contains multiple semantic segments.
no code implementations • 29 Nov 2019 • Rujing Yao, Linlin Hou, Yingchun Ye, Ou wu, Ji Zhang, Jian Wu
In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers.
no code implementations • 27 Nov 2019 • Xinjie Yao, Ji Zhang, Jean Oh
The underlying system incorporates a deep neural network to track social groups and join the flow of a social group in facilitating the navigation.
no code implementations • COLING 2020 • Ruize Wang, Zhongyu Wei, Ying Cheng, Piji Li, Haijun Shan, Ji Zhang, Qi Zhang, Xuanjing Huang
Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically.
Ranked #9 on Visual Storytelling on VIST
no code implementations • 16 Jul 2019 • Shijie Geng, Ji Zhang, Hang Zhang, Ahmed Elgammal, Dimitris N. Metaxas
We present a simple method that achieves unexpectedly superior performance for Complex Reasoning involved Visual Question Answering.
3 code implementations • CVPR 2019 • Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro
The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).
no code implementations • 28 Nov 2018 • Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi, Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, Haiqing Chen
To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the document-level and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension.
Ranked #2 on Question Answering on MS MARCO
no code implementations • 21 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal
We propose an efficient and interpretable scene graph generator.
no code implementations • 1 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal
This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.
no code implementations • 20 Oct 2018 • Xin Tang, Shanbo Cheng, Loc Do, Zhiyu Min, Feng Ji, Heng Yu, Ji Zhang, Haiqin Chen
Our approach is extended from a basic monolingual STS framework to a shared multilingual encoder pretrained with translation task to incorporate rich-resource language data.
1 code implementation • EMNLP 2018 • Chunqi Wang, Ji Zhang, Haiqing Chen
Existing approaches to neural machine translation are typically autoregressive models.
no code implementations • 8 Aug 2018 • Pengfei Liu, Ji Zhang, Cane Wing-Ki Leung, Chao He, Thomas L. Griffiths
Effective representation of a text is critical for various natural language processing tasks.
2 code implementations • 27 Apr 2018 • Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny
Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.
no code implementations • CVPR 2017 • Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal
We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.
1 code implementation • Robotics: Science and Systems Conference 2014 • Ji Zhang, Sanjiv Singh
We propose a real-time method for odometry and mapping using range measurements from a 2-axis lidar moving in 6-DOF.