Search Results for author: Jie zhou

Found 611 papers, 352 papers with code

RSGT: Relational Structure Guided Temporal Relation Extraction

no code implementations COLING 2022 Jie zhou, Shenpo Dong, Hongkui Tu, Xiaodong Wang, Yong Dou

In this paper, we propose RSGT: Relational Structure Guided Temporal Relation Extraction to extract the relational structure features that can fit for both inter-sentence and intra-sentence relations.

Graph Neural Network Natural Language Understanding +3

Rotation-robust Intersection over Union for 3D Object Detection

no code implementations ECCV 2020 Yu Zheng, Danyang Zhang, Sinan Xie, Jiwen Lu, Jie zhou

In this paper, we propose a Rotation-robust Intersection over Union ($ extit{RIoU}$) for 3D object detection, which aims to jointly learn the overlap of rotated bounding boxes.

3D Object Detection Object +1

Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

no code implementations ECCV 2020 Guangyi Chen, Yongming Rao, Jiwen Lu, Jie zhou

Specifically, we disentangle the video representation into the temporal coherence and motion parts and randomly change the scale of the temporal motion features as the adversarial noise.

Video-Based Person Re-Identification

BMInf: An Efficient Toolkit for Big Model Inference and Tuning

1 code implementation ACL 2022 Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie zhou, Jun Zhang, Jia Chao, Maosong Sun

In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks.

Quantization Scheduling

Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification

no code implementations ECCV 2020 Guangyi Chen, Yuhao Lu, Jiwen Lu, Jie Zhou

Experimental results demonstrate that our DCML method explores credible and valuable training data and improves the performance of unsupervised domain adaptation.

Metric Learning Person Re-Identification +2

TAKE: Topic-shift Aware Knowledge sElection for Dialogue Generation

1 code implementation COLING 2022 Chenxu Yang, Zheng Lin, Jiangnan Li, Fandong Meng, Weiping Wang, Lanrui Wang, Jie zhou

The knowledge selector generally constructs a query based on the dialogue context and selects the most appropriate knowledge to help response generation.

Dialogue Generation Knowledge Distillation +1

Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation

no code implementations EMNLP 2020 Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou

Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.

Decoder Dialogue Generation +1

Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach

1 code implementation Findings (ACL) 2022 Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie zhou

In recent years, pre-trained language models (PLMs) have been shown to capture factual knowledge from massive texts, which encourages the proposal of PLM-based knowledge graph completion (KGC) models.

Knowledge Graph Completion Link Prediction

Divide and Denoise: Learning from Noisy Labels in Fine-Grained Entity Typing with Cluster-Wise Loss Correction

no code implementations ACL 2022 Kunyuan Pang, Haoyu Zhang, Jie zhou, Ting Wang

In this work, we propose a clustering-based loss correction framework named Feature Cluster Loss Correction (FCLC), to address these two problems.

Entity Typing

Unsupervised Dependency Graph Network

1 code implementation ACL 2022 Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville

We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.

Language Modeling Language Modelling +4

Constructing Emotional Consensus and Utilizing Unpaired Data for Empathetic Dialogue Generation

no code implementations Findings (EMNLP) 2021 Lei Shen, Jinchao Zhang, Jiao Ou, Xiaofang Zhao, Jie zhou

To address the above issues, we propose a dual-generative model, Dual-Emp, to simultaneously construct the emotional consensus and utilize some external unpaired data.

Dialogue Generation

Structural Deep Metric Learning for Room Layout Estimation

no code implementations ECCV 2020 Wenzhao Zheng, Jiwen Lu, Jie zhou

We employ a metric model and a layout encoder to map the RGB images and the ground-truth layouts to the embedding space, respectively, and a layout decoder to map the embeddings to the corresponding layouts, where the whole framework is trained in an end-to-end manner.

Decoder Metric Learning +1

MovieChats: Chat like Humans in a Closed Domain

no code implementations EMNLP 2020 Hui Su, Xiaoyu Shen, Zhou Xiao, Zheng Zhang, Ernie Chang, Cheng Zhang, Cheng Niu, Jie zhou

In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots.

Chatbot Retrieval

CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

1 code implementation EMNLP 2021 Yuan YAO, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie zhou, Maosong Sun

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents.

Relation Relation Extraction

Deep Hashing with Active Pairwise Supervision

no code implementations ECCV 2020 Ziwei Wang, Quan Zheng, Jiwen Lu, Jie zhou

n this paper, we propose a Deep Hashing method with Active Pairwise Supervision(DH-APS).

Deep Hashing

欺骗类动词的句法语义研究(On the Syntax and Semantics of Verbs of Cheating)

no code implementations CCL 2021 Shan Wang, Jie zhou

“欺骗是一种常见的社会现象, 但对欺骗类动词的研究十分有限。本文筛选“欺骗”类动词的单句并对其进行大规模的句法依存和语义依存分析。研究显示,“欺骗”类动词在句中作为从属词时, 可作为不同的句法成分和语义角色, 同时此类动词在句法功能上表现出高度的相似性。作为支配词的“欺骗”类动词, 承担不同句法功能时, 表现出不同的句法共现模式。语义上, 本文详细描述、解释了该类动词在语义密度、主客体角色、情境角色和事件关系等维度的语义依存特点。“欺骗”类动词的句法语义虽具有多样性, 但主要的句型为主谓宾句式, 而该句式中最常用的语义搭配模式是施事对涉事进行欺骗行为, 并对涉事产生影响。本研究结合依存语法和框架语义学, 融合定量统计和定性分析探究欺骗类动词的句法语义, 深化了对欺骗行为言语线索以及言说动词的研究。”

A Dual-Space Framework for General Knowledge Distillation of Large Language Models

no code implementations15 Apr 2025 Xue Zhang, Songming Zhang, Yunlong Liang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie zhou

However, we reveal that the current white-box KD framework exhibits two limitations: a) bridging probability distributions from different output spaces will limit the similarity between the teacher model and the student model; b) this framework cannot be applied to LLMs with different vocabularies.

Code Generation General Knowledge +3

Deep Reasoning Translation via Reinforcement Learning

1 code implementation14 Apr 2025 Jiaan Wang, Fandong Meng, Jie zhou

In this paper, we introduce DeepTrans, a deep reasoning translation model that learns free translation via reinforcement learning.

reinforcement-learning Reinforcement Learning +2

RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation

no code implementations22 Mar 2025 Zhiqiang Yuan, Ting Zhang, Ying Deng, Jiapei Zhang, Yeshuang Zhu, Zexi Jia, Jie zhou, Jinchao Zhang

In this paper, we argue that under constrained resources, training a smaller video generation model from scratch using only million-level samples can outperform parameter-efficient tuning on larger models in downstream applications: the core lies in the effective utilization of data and curriculum strategy.

Video Generation

D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens

no code implementations21 Mar 2025 Panpan Wang, LiQiang Niu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie zhou

In contrast, diffusion models take advantage of the continuous-valued tokenizer to achieve better generation quality but are subject to low efficiency and complexity.

Conditional Image Generation

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models

no code implementations19 Mar 2025 Yinan Liang, Ziwei Wang, Xiuwei Xu, Jie zhou, Jiwen Lu

While multimodal large language models demonstrate strong performance in complex reasoning tasks, they pose significant challenges related to model complexity during deployment, especially for resource-limited devices.

MM-Vet Multimodal Reasoning +2

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

no code implementations18 Mar 2025 Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun Gai

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels.

Text-to-Image Generation

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

no code implementations13 Mar 2025 Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie zhou, Jiwen Lu

Specifically, we conduct graph matching between the scene graph and goal graph at each time instant and propose different strategies to generate long-term goal of exploration according to different matching states.

Graph Matching

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

no code implementations4 Mar 2025 Zhibin Lan, LiQiang Niu, Fandong Meng, Jie zhou, Jinsong Su

To deal with this issue, we propose a simple yet effective framework that dynamically improves the embedding model's representation learning for negative pairs based on their discriminative difficulty.

Contrastive Learning Image-text Retrieval +4

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

no code implementations1 Mar 2025 Tianyu Huai, Jie zhou, Xingjiao Wu, Qin Chen, Qingchun Bai, Ze Zhou, Liang He

Multimodal large language models (MLLMs) have garnered widespread attention from researchers due to their remarkable understanding and generation capabilities in visual language tasks (e. g., visual question answering).

Continual Learning Language Modeling +6

DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance

no code implementations24 Feb 2025 Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie zhou, Piji Li

To alleviate memory burden during inference of large language models (LLMs), numerous studies have focused on compressing the KV cache by exploring aspects such as attention sparsity.

Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking

1 code implementation19 Feb 2025 Yanzeng Li, Yunfan Xiong, Jialun Zhong, Jinchao Zhang, Jie zhou, Lei Zou

We investigate LLMs' safety mechanisms and their recent applications, revealing a new threat model targeting structured output interfaces, which enable attackers to manipulate the inner logit during LLM generation, requiring only API access permissions.

Prompt Engineering Safety Alignment

Control-CLIP: Decoupling Category and Style Guidance in CLIP for Specific-Domain Generation

no code implementations17 Feb 2025 Zexi Jia, Chuanwei Huang, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie zhou

As a result, generation models tend to fail on prompts like "a photo of a cat in Pokemon style" in terms of simply producing images depicting "a photo of a cat".

Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation

1 code implementation17 Feb 2025 Zengkui Sun, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie zhou

However, the conventional KD methods endure the distribution mismatch issue between the teacher and student models, leading to the poor performance of distillation.

Knowledge Distillation Math

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

1 code implementation17 Feb 2025 Yuxiang Huang, Mingye Li, Xu Han, Chaojun Xiao, Weilin Zhao, Sun Ao, Hao Zhou, Jie zhou, Zhiyuan Liu, Maosong Sun

While long-context inference is crucial for advancing large language model (LLM) applications, its prefill speed remains a significant bottleneck.

Language Modeling Language Modelling +1

Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping

1 code implementation16 Feb 2025 Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie zhou

As for the cross-tokenizer KD, the differences in the tokenizers give rise to two fundamental challenges: (1) sequence misalignment caused by divergent tokenization strategies, and (2) mismatched vocabulary size and composition.

Code Generation Instruction Following +3

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

1 code implementation14 Feb 2025 Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie zhou, Jiwen Lu

To this end, we propose text-guided pruning (TGP) and completion-based addition (CBA) to deeply fuse 3D scene representation and text features in an efficient way by gradual region pruning and target completion.

3D Object Detection 3D visual grounding +1

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

1 code implementation13 Feb 2025 Mo Yu, Lemao Liu, Junjie Wu, Tsz Ting Chung, Shunchi Zhang, Jiangnan Li, Dit-yan Yeung, Jie zhou

In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot.

In-Context Learning Memorization

Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

no code implementations11 Feb 2025 Junjie Wu, Mo Yu, Lemao Liu, Dit-yan Yeung, Jie zhou

While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely on utilizing the vast amount of knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge.

ARC

Semantic to Structure: Learning Structural Representations for Infringement Detection

no code implementations11 Feb 2025 Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie zhou

Structural information in images is crucial for aesthetic assessment, and it is widely recognized in the artistic field that imitating the structure of other works significantly infringes on creators' rights.

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

1 code implementation3 Feb 2025 Xinyan Guan, Jiali Zeng, Fandong Meng, Chunlei Xin, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Jie zhou

Large Language Models (LLMs) have shown remarkable potential in reasoning while they still suffer from severe factual hallucinations due to timeliness, accuracy, and coverage of parametric knowledge.

RAG Retrieval

NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild

1 code implementation23 Jan 2025 Yongxiang Liu, Weijie Li, Li Liu, Jie zhou, Xuying Xiong, Bowen Peng, Yafei Song, Wei Yang, Tianpeng Liu, Zhen Liu, Xiang Li

This paper introduces NUDT4MSTAR, a large-scale SAR dataset for remote sensing target recognition in the wild, including 40 vehicle target types and various imaging conditions across 5 realistic scenes.

Earth Observation Object Recognition +1

Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

no code implementations19 Jan 2025 Linchao Pan, Can Gao, Jie zhou, Jinbao Wang

Learning with Noisy Labels (LNL) aims to improve the model generalization when facing data with noisy labels, and existing methods generally assume that noisy labels come from known classes, called closed-set noise.

Contrastive Learning Learning with noisy labels +1

Personalized Language Model Learning on Text Data Without User Identifiers

1 code implementation10 Jan 2025 Yucheng Ding, Yangwenjian Tan, Xiangyu Liu, Chaoyue Niu, Fandong Meng, Jie zhou, Ning Liu, Fan Wu, Guihai Chen

In many practical natural language applications, user data are highly sensitive, requiring anonymous uploads of text data from mobile devices to the cloud without user identifiers.

Language Modeling Language Modelling

Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

1 code implementation6 Jan 2025 Can Gao, Xiaofeng Tan, Jie zhou, Weiping Ding, Witold Pedrycz

Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks.

Outlier Detection

The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

no code implementations3 Jan 2025 Chulun Zhou, Qiujing Wang, Mo Yu, Xiaoqian Yue, Rui Lu, Jiangnan Li, Yifan Zhou, Shunchi Zhang, Jie zhou, Wai Lam

Theory-of-Mind (ToM) is a fundamental psychological capability that allows humans to understand and interpret the mental states of others.

Question Answering

WalkVLM:Aid Visually Impaired People Walking by Vision Language Model

no code implementations30 Dec 2024 Zhiqiang Yuan, Ting Zhang, Ying Deng, Jiapei Zhang, Yeshuang Zhu, Zexi Jia, Jie zhou, Jinchao Zhang

Moreover, in blind walking task, it is necessary to perform real-time streaming video parsing and generate concise yet informative reminders, which poses a great challenge for VLMs that suffer from redundant responses and low inference efficiency.

Language Modeling Language Modelling +1

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

1 code implementation23 Dec 2024 Jiaan Wang, Fandong Meng, Yunlong Liang, Jie zhou

Using Qwen2. 5 and LLama-3. 1 as the backbones, DRT-o1 models can learn the thought process during machine translation, and outperform vanilla LLMs as well as existing O1-like LLMs, showing their effectiveness The project is available at https://github. com/krystalan/DRT-o1

Machine Translation Math +1

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

1 code implementation19 Dec 2024 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation.

Image Reconstruction Quantization

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

no code implementations16 Dec 2024 Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

To provide a comprehensive evaluation, PunchBench incorporates diverse question formats and image-captions from various domains.

Benchmarking Image Captioning +1

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

1 code implementation13 Dec 2024 Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie zhou, Jiwen Lu

3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings.

Autonomous Driving Prediction

Doe-1: Closed-Loop Autonomous Driving with Large World Model

1 code implementation12 Dec 2024 Wenzhao Zheng, Zetian Xia, Yuanhui Huang, Sicheng Zuo, Jie zhou, Jiwen Lu

In this paper, we explore a closed-loop framework for autonomous driving and propose a large Driving wOrld modEl (Doe-1) for unified perception, prediction, and planning.

Autonomous Driving Decision Making +4

Owl-1: Omni World Model for Consistent Long Video Generation

1 code implementation12 Dec 2024 Yuanhui Huang, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Di Zhang, Jie zhou, Jiwen Lu

As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos.

Video Generation

POINTS1.5: Building a Vision-Language Model towards Real World Applications

no code implementations11 Dec 2024 YuAn Liu, Le Tian, Xiao Zhou, Xinyu Gao, Kavio Yu, Yang Yu, Jie zhou

Due to the scarcity of open-source Chinese datasets for vision-language models, we collect numerous images from the Internet and annotate them using a combination of manual and automatic methods.

Language Modeling Language Modelling +1

GPD-1: Generative Pre-training for Driving

2 code implementations11 Dec 2024 Zixun Xie, Sicheng Zuo, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jie zhou, Jiwen Lu, Shanghang Zhang

We represent each scene with ego, agent, and map tokens and formulate autonomous driving as a unified token generation problem.

Autonomous Driving Decision Making +4

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

1 code implementation6 Dec 2024 Lening Wang, Wenzhao Zheng, Dalong Du, Yunpeng Zhang, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jie zhou, Jiwen Lu, Shanghang Zhang

To address these limitations, we propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes and design a controllable generative network to achieve 4D simulation.

Autonomous Driving Scene Understanding +1

Compound Gaussian Radar Clutter Model With Positive Tempered Alpha-Stable Texture

no code implementations6 Dec 2024 Xingxing Liao, Junhao Xie, Jie zhou

This work develops a flexible-tailed CG model to improve generality in clutter modeling, by introducing the positive tempered $\alpha$-stable (PT$\alpha$S) distribution to model clutter texture.

Densing Law of LLMs

no code implementations5 Dec 2024 Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Biyuan Lin, Jie zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, Maosong Sun

This paper introduces the concept of ``\textit{capacity density}'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency.

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

2 code implementations5 Dec 2024 Yuqi Wu, Wenzhao Zheng, Sicheng Zuo, Yuanhui Huang, Jie zhou, Jiwen Lu

3D occupancy prediction provides a comprehensive description of the surrounding scenes and has become an essential task for 3D perception.

Prediction Scene Understanding

Retrieval-Augmented Machine Translation with Unstructured Knowledge

1 code implementation5 Dec 2024 Jiaan Wang, Fandong Meng, Yingxue Zhang, Jie zhou

In machine translation (MT), previous work typically retrieves in-context examples from paired MT corpora, or domain-specific knowledge from knowledge graphs, to enhance models' MT ability.

Knowledge Graphs Machine Translation +4

From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents

1 code implementation4 Dec 2024 Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie zhou, Xuanjing Huang, Zhongyu Wei

We categorize the simulations into three types: (1) Individual Simulation, which mimics specific individuals or demographic groups; (2) Scenario Simulation, where multiple agents collaborate to achieve goals within specific contexts; and (3) Society Simulation, which models interactions within agent societies to reflect the complexity and variety of real-world dynamics.

Language Modeling Language Modelling +1

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

1 code implementation20 Nov 2024 Ziyi Wang, Yanbo Wang, Xumin Yu, Jie zhou, Jiwen Lu

In our approach, we developed a mask generator based on the denoising UNet from a pre-trained diffusion model, leveraging its capability for precise textual control over dense pixel representations and enhancing the open-world adaptability of the generated masks.

3D geometry 3D Semantic Segmentation +3

MaDiNet: Mamba Diffusion Network for SAR Target Detection

1 code implementation12 Nov 2024 Jie zhou, Chao Xiao, Bowen Peng, Tianpeng Liu, Zhen Liu, Yongxiang Liu, Li Liu

The fundamental challenge in SAR target detection lies in developing discriminative, efficient, and robust representations of target characteristics within intricate non-cooperative environments.

Mamba

CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models

no code implementations28 Oct 2024 Meiqi Chen, Fandong Meng, Yingxue Zhang, Yan Zhang, Jie zhou

In this paper, we propose CRAT, a novel multi-agent translation framework that leverages RAG and causality-enhanced self-reflection to address these challenges.

Machine Translation RAG +1

MiniPLM: Knowledge Distillation for Pre-Training Language Models

1 code implementation22 Oct 2024 Yuxian Gu, Hao Zhou, Fandong Meng, Jie zhou, Minlie Huang

For effectiveness, MiniPLM leverages the differences between large and small LMs to enhance the difficulty and diversity of the training data, helping student LMs acquire versatile and sophisticated knowledge.

Diversity Knowledge Distillation +2

GlobalMamba: Global Image Serialization for Vision Mamba

1 code implementation14 Oct 2024 Chengkun Wang, Wenzhao Zheng, Jie zhou, Jiwen Lu

In this paper, we propose a global image serialization method to transform the image into a sequence of causal tokens, which contain global information of the 2D image.

Image Classification Mamba +3

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

1 code implementation14 Oct 2024 Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie zhou, Jiwen Lu

Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM).

Instance Segmentation Mamba +4

On the token distance modeling ability of higher RoPE attention dimension

no code implementations11 Oct 2024 Xiangyu Hong, Che Jiang, Biqing Qi, Fandong Meng, Mo Yu, BoWen Zhou, Jie zhou

We further demonstrate the correlation between the efficiency of length extrapolation and the extension of the high-dimensional attention allocation of these heads.

Position Reading Comprehension

Q-VLM: Post-training Quantization for Large Vision-Language Models

1 code implementation10 Oct 2024 Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

On the contrary, we mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy searching with low search cost.

Language Modeling Language Modelling +1

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

no code implementations10 Oct 2024 Hang Yin, Xiuwei Xu, Zhenyu Wu, Jie zhou, Jiwen Lu

Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects, which lacks enough scene context for in-depth reasoning.

Object

DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models

no code implementations8 Oct 2024 Ranchi Zhao, Zhen Leng Thai, Yifan Zhang, Shengding Hu, Yunqi Ba, Jie zhou, Jie Cai, Zhiyuan Liu, Maosong Sun

The performance of Large Language Models (LLMs) is substantially influenced by the pretraining corpus, which consists of vast quantities of unsupervised data processed by the models.

Language Modeling Language Modelling +2

Exploring the Benefit of Activation Sparsity in Pre-training

1 code implementation4 Oct 2024 Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training.

OPONeRF: One-Point-One NeRF for Robust Neural Rendering

1 code implementation30 Sep 2024 Yu Zheng, Yueqi Duan, Kangfu Zheng, Hongru Yan, Jiwen Lu, Jie zhou

In this paper, we propose a One-Point-One NeRF (OPONeRF) framework for robust scene rendering.

NeRF Neural Rendering

MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation

no code implementations30 Sep 2024 Wenchao Chen, LiQiang Niu, Ziyao Lu, Fandong Meng, Jie zhou

Image generation models have encountered challenges related to scalability and quadratic complexity, primarily due to the reliance on Transformer-based backbones.

Mamba Text-to-Image Generation

A Survey on the Honesty of Large Language Models

2 code implementations27 Sep 2024 Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam

Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge.

Survey

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

1 code implementation26 Sep 2024 Wenliang Zhao, Minglei Shi, Xumin Yu, Jie zhou, Jiwen Lu

By integrating FlowTurbo into different flow-based models, we obtain an acceleration ratio of 53. 1%$\sim$58. 3% on class-conditional generation and 29. 8%$\sim$38. 5% on text-to-image generation.

Text-to-Image Generation

AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity

1 code implementation20 Sep 2024 Zhibin Lan, LiQiang Niu, Fandong Meng, Wenbo Li, Jie zhou, Jinsong Su

Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens.

Avg

POINTS: Improving Your Vision-language Model with Affordable Strategies

no code implementations7 Sep 2024 YuAn Liu, Zhongyin Zhao, Ziyuan Zhuang, Le Tian, Xiao Zhou, Jie zhou

To address these issues, we propose the following contributions: 1) We trained a robust baseline model using the latest advancements in vision-language models, introducing effective improvements and conducting comprehensive ablation and validation for each technique.

Language Modeling Language Modelling +2

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

1 code implementation5 Sep 2024 Wenliang Zhao, Haolin Wang, Jie zhou, Jiwen Lu

Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling.

MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

no code implementations22 Aug 2024 Yanzeng Li, Cheng Zeng, Jinchao Zhang, Jie zhou, Lei Zou

Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG.

Diagnostic Hallucination +4

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

1 code implementation21 Aug 2024 Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie zhou, Jiwen Lu

In this paper, we aim to leverage Segment Anything Model (SAM) for real-time 3D instance segmentation in an online setting.

3D Instance Segmentation Semantic Segmentation

Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization

no code implementations7 Aug 2024 Yanhu Wang, Muhammad Muzammil Afzal, Zhengyang Li, Jie zhou, Chenyuan Feng, Shuaishuai Guo, Tony Q. S. Quek

Traditional base station siting (BSS) methods rely heavily on drive testing and user feedback, which are laborious and require extensive expertise in communication, networking, and optimization.

Language Modeling Language Modelling +2

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

2 code implementations3 Aug 2024 Yuan YAO, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone.

Hallucination Multiple-choice +3

Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction

1 code implementation3 Aug 2024 Wenhao Li, Jie zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao

In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital.

Click-Through Rate Prediction Diversity

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

1 code implementation29 Jul 2024 Chaoqun Du, Yulin Wang, Jiayi Guo, Yizeng Han, Jie zhou, Gao Huang

To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable.

Test-time Adaptation

Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

1 code implementation24 Jul 2024 Yuyang Ding, Hanglei Hu, Jie zhou, Qin Chen, Bo Jiang, Liang He

With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success.

Math

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words

1 code implementation23 Jul 2024 Yijie Chen, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie zhou

This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words), which assesses gender bias beyond binary gender.

Machine Translation Translation

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

1 code implementation21 Jul 2024 Ning Wang, Bingkun Yao, Jie zhou, Xi Wang, Zhe Jiang, Nan Guan

Recent advancements in large language models (LLMs) have sparked significant interest in the automatic generation of Register Transfer Level (RTL) designs, particularly using Verilog.

Code Generation Language Modeling +4

Patch-Level Training for Large Language Models

1 code implementation17 Jul 2024 Chenze Shao, Fandong Meng, Jie zhou

As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern.

Language Modeling Language Modelling

Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task

1 code implementation9 Jul 2024 Yiran Yang, Jinchao Zhang, Ying Deng, Jie zhou

However, the traditional 3D-Unet is a serial mode and the temporal layers follow the spatial layers, which will result in high GPU memory and training time consumption according to its serial feature flow.

Text-to-Video Generation Video Generation

Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

1 code implementation3 Jul 2024 Zhibin Lan, LiQiang Niu, Fandong Meng, Jie zhou, Min Zhang, Jinsong Su

Among them, the target text decoder is used to alleviate the language alignment burden, and the image tokenizer converts long sequences of pixels into shorter sequences of visual tokens, preventing the model from focusing on low-level visual features.

Decoder Machine Translation

Camera-LiDAR Cross-modality Gait Recognition

no code implementations2 Jul 2024 Wenxuan Guo, Yingping Liang, Zhiyu Pan, Ziheng Xi, Jianjiang Feng, Jie zhou

In this work, we propose the first cross-modality gait recognition framework between Camera and LiDAR, namely CL-Gait.

Gait Recognition

C-LLM: Learn to Check Chinese Spelling Errors Character by Character

1 code implementation24 Jun 2024 Kunting Li, Yong Hu, Liang He, Fandong Meng, Jie zhou

To address this issue, we propose C-LLM, a Large Language Model-based Chinese Spell Checking method that learns to check errors Character by Character.

Chinese Spell Checking Language Modeling +2

Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

1 code implementation24 Jun 2024 Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie zhou

To address this issue, we first investigate how LLMs process multilingual factual knowledge and discover that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons (LAFNs).

knowledge editing

P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

no code implementations18 Jun 2024 Yuhao Dan, Jie zhou, Qin Chen, Junfeng Tian, Liang He

Personalized large language models (LLMs) have attracted great attention in many applications, such as intelligent education and emotional support.

Mixture-of-Experts

Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

no code implementations13 Jun 2024 Yuhao Dan, Junfeng Tian, Jie zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations.

Contrastive Learning Data-to-Text Generation +2

TasTe: Teaching Large Language Models to Translate through Self-Reflection

1 code implementation12 Jun 2024 Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie zhou, Min Zhang

The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.

Instruction Following Machine Translation +2

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

1 code implementation11 Jun 2024 Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie zhou, Yu Qiao, Jifeng Dai

Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data.

Contrastive Learning

Think out Loud: Emotion Deducing Explanation in Dialogues

no code implementations7 Jun 2024 Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Weiping Wang, Jie zhou

Besides, EDEN can help LLMs achieve better recognition of emotions and causes, which explores a new research direction of explainable emotion understanding in dialogues.

Common Sense Reasoning Emotion Cause Extraction

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

1 code implementation6 Jun 2024 Fangfu Liu, HanYang Wang, Shunyu Yao, Shengjun Zhang, Jie zhou, Yueqi Duan

In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors.

3D Generation

Learning 1D Causal Visual Representation with De-focus Attention Networks

1 code implementation6 Jun 2024 Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie zhou, Jifeng Dai

The issue of "over-focus" hinders the model's ability to extract diverse visual features and to receive effective gradients for optimization.

Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge

1 code implementation5 Jun 2024 Zengkui Sun, Yijin Liu, Jiaan Wang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie zhou

Consequently, on the reasoning questions, we discover that existing methods struggle to utilize the edited knowledge to reason the new answer, and tend to retain outdated responses, which are generated by the original models utilizing original knowledge.

knowledge editing

LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

1 code implementation5 Jun 2024 Zengkui Sun, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie zhou

Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences.

Decoder Machine Translation +2

FlowIE: Efficient Image Enhancement via Rectified Flow

1 code implementation CVPR 2024 Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang, Jie zhou, Jiwen Lu

Image enhancement holds extensive applications in real-world scenarios due to complex environments and limitations of imaging devices.

Image Enhancement

Language Generation with Strictly Proper Scoring Rules

1 code implementation29 May 2024 Chenze Shao, Fandong Meng, Yijin Liu, Jie zhou

Leveraging this strategy, we train language generation models using two classic strictly proper scoring rules, the Brier score and the Spherical score, as alternatives to the logarithmic score.

Language Modeling Language Modelling +2

Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective

no code implementations29 May 2024 Chenze Shao, Fandong Meng, Jiali Zeng, Jie zhou

Building upon this analysis, we propose employing the confidence of predicting EOS as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.

Machine Translation NMT +2

Recent Advances of Foundation Language Models-based Continual Learning: A Survey

no code implementations28 May 2024 Yutao Yang, Jie zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Yuan Xie, Liang He

Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV).

class-incremental learning Class Incremental Learning +2

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

1 code implementation27 May 2024 Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie zhou, Jiwen Lu

To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features.

3D Semantic Occupancy Prediction Autonomous Driving +2

NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation

1 code implementation23 May 2024 Chaokang Jiang, Dalong Du, Jiuming Liu, Siting Zhu, Zhenqiang Liu, Zhuang Ma, Zhujin Liang, Jie zhou

Point Cloud Interpolation confronts challenges from point sparsity, complex spatiotemporal dynamics, and the difficulty of deriving complete 3D point clouds from sparse temporal information.

Autonomous Driving

Rethinking Overlooked Aspects in Vision-Language Models

no code implementations20 May 2024 YuAn Liu, Le Tian, Xiao Zhou, Jie zhou

Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial.

Multi-level Shared Knowledge Guided Learning for Knowledge Graph Completion

no code implementations8 May 2024 Yongxue Shan, Jie zhou, Jie Peng, Xin Zhou, Jiaqian Yin, Xiaodong Wang

In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance.

Knowledge Graph Completion Multi-Task Learning +3

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

no code implementations2 May 2024 Wenxuan Guo, Zhiyu Pan, Ziheng Xi, Alapati Tuerxun, Jianjiang Feng, Jie zhou

The visualization results showcase the immense potential of our sports visualization system on the domain of watching games on VR/AR devices.

Pose Estimation

Latent Fingerprint Matching via Dense Minutia Descriptor

1 code implementation2 May 2024 Zhiyu Pan, Yongjie Duan, Xiongjun Guan, Jianjiang Feng, Jie zhou

Latent fingerprint matching is a daunting task, primarily due to the poor quality of latent fingerprints.

valid

Phase-aggregated Dual-branch Network for Efficient Fingerprint Dense Registration

1 code implementation26 Apr 2024 Xiongjun Guan, Jianjiang Feng, Jie zhou

Fingerprint dense registration aims to finely align fingerprint pairs at the pixel level, thereby reducing intra-class differences caused by distortion.

Pose-Specific 3D Fingerprint Unfolding

no code implementations26 Apr 2024 Xiongjun Guan, Jianjiang Feng, Jie zhou

The problem with this method is that there may be large elastic deformation between the unfolded rolled fingerprint and flat fingerprint, which affects the recognition rate.

Direct Regression of Distortion Field from a Single Fingerprint Image

1 code implementation26 Apr 2024 Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie zhou

However, existing rectification methods are based on principal component representation of distortion fields, which is not accurate and are very sensitive to finger pose.

regression

Regression of Dense Distortion Field from a Single Fingerprint Image

1 code implementation26 Apr 2024 Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie zhou

However, existing rectification methods are based on principal component representation of distortion fields, which is not accurate and are very sensitive to finger pose.

regression

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

no code implementations25 Apr 2024 Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, HaoNing Wu, Yixuan Gao, Yuqin Cao, ZiCheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng, Jianquan Yang, Weigang Wang, Xi Fang, Xiaoxin Lv, Jun Yan, Tianwu Zhi, Yabin Zhang, Yaohui Li, Yang Li, Jingwen Xu, Jianzhao Liu, Yiting Liao, Junlin Li, Zihao Yu, Yiting Lu, Xin Li, Hossein Motamednia, S. Farhad Hosseini-Benvidi, Fengbin Guan, Ahmad Mahmoudi-Aznaveh, Azadeh Mansouri, Ganzorig Gankhuyag, Kihwan Yoon, Yifang Xu, Haotian Fan, Fangyuan Kong, Shiling Zhao, Weifeng Dong, Haibing Yin, Li Zhu, Zhiling Wang, Bingchen Huang, Avinab Saha, Sandeep Mishra, Shashank Gupta, Rajesh Sureddi, Oindrila Saha, Luigi Celona, Simone Bianco, Paolo Napoletano, Raimondo Schettini, Junfeng Yang, Jing Fu, Wei zhang, Wenzhi Cao, Limei Liu, Han Peng, Weijun Yuan, Zhan Li, Yihang Cheng, Yifan Deng, Haohui Li, Bowen Qu, Yao Li, Shuqing Luo, Shunzhou Wang, Wei Gao, Zihao Lu, Marcos V. Conde, Xinrui Wang, Zhibo Chen, Ruling Liao, Yan Ye, Qiulin Wang, Bing Li, Zhaokun Zhou, Miao Geng, Rui Chen, Xin Tao, Xiaoyu Liang, Shangkun Sun, Xingyuan Ma, Jiaze Li, Mengduo Yang, Haoran Xu, Jie zhou, Shiding Zhu, Bohan Yu, Pengfei Chen, Xinrui Xu, Jiabin Shen, Zhichao Duan, Erfan Asadi, Jiahe Liu, Qi Yan, Youran Qu, Xiaohui Zeng, Lele Wang, Renjie Liao

A total of 196 participants have registered in the video track.

Image Quality Assessment Image Restoration +2

Automatic Knowledge Graph Construction for Judicial Cases

no code implementations15 Apr 2024 Jie zhou, Xin Chen, Hang Zhang, Zhe Li

Building on these results, we detail the automatic construction process of case knowledge graphs for judicial cases, enabling the assembly of knowledge graphs for hundreds of thousands of judgments.

graph construction Knowledge Graphs

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

1 code implementation11 Apr 2024 Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment.

Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective

1 code implementation11 Apr 2024 Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie zhou

In this paper, we suggest that code comments are the natural logic pivot between natural language and code language and propose using comments to boost the code generation ability of code LLMs.

Code Generation HumanEval +1

Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy

1 code implementation10 Apr 2024 Yijin Liu, Fandong Meng, Jie zhou

Recently, dynamic computation methods have shown notable acceleration for Large Language Models (LLMs) by skipping several layers of computations through elaborate heuristics or additional predictors.

Machine Translation Text Summarization

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

1 code implementation CVPR 2024 Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie zhou, Jiwen Lu

The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion.

Denoising Human Mesh Recovery

On Large Language Models' Hallucination with Regard to Known Facts

1 code implementation29 Mar 2024 Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, BoWen Zhou, Jie zhou

We reveal the different dynamics of the output token probabilities along the depths of layers between the correct and hallucinated cases.

Hallucination Triplet

Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check

no code implementations27 Mar 2024 Linhao Ye, Zhikai Lei, Jianghao Yin, Qin Chen, Jie zhou, Liang He

Retrieval-Augmented Generation (RAG) aims to generate more reliable and accurate responses, by augmenting large language models (LLMs) with the external vast and dynamic knowledge.

Conversational Question Answering RAG +1

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

1 code implementation19 Mar 2024 Zuyan Liu, Yuhao Dong, Yongming Rao, Jie zhou, Jiwen Lu

In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications.

visual instruction following Visual Question Answering

RCdpia: A Renal Carcinoma Digital Pathology Image Annotation dataset based on pathologists

no code implementations17 Mar 2024 Qingrong Sun, Weixiang Zhong, Jie zhou, Chong Lai, Xiaodong Teng, Maode Lai

The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor.

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

1 code implementation16 Mar 2024 Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou

The appearance embedding models the characteristics of low-resolution inputs to deal with photometric variations at different scales, and the pixel-based deformation field learns RGB differences which result from the deviations between the real-world and simulated degradations at arbitrary coordinates.

Super-Resolution

Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking

no code implementations12 Mar 2024 Yiyang Gu, Yougen Zhou, Qin Chen, Ningning Zhou, Jie zhou, Aimin Zhou, Liang He

Depression-diagnosis-oriented chat aims to guide patients in self-expression to collect key symptoms for depression detection.

Depression Detection Language Modeling +3

Memory-based Adapters for Online 3D Scene Perception

no code implementations CVPR 2024 Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie zhou, Jiwen Lu

To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability.

DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models

1 code implementation1 Mar 2024 Kedi Chen, Qin Chen, Jie zhou, Yishen He, Liang He

Since large language models (LLMs) achieve significant success in recent years, the hallucination issue remains a challenge, numerous benchmarks are proposed to detect the hallucination.

Hallucination Hallucination Evaluation +1

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

1 code implementation29 Feb 2024 Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).

Navigate

Let's Rectify Step by Step: Improving Aspect-based Sentiment Analysis with Diffusion Models

1 code implementation23 Feb 2024 Shunyu Liu, Jie zhou, Qunxi Zhu, Qin Chen, Qingchun Bai, Jun Xiao, Liang He

Aspect-Based Sentiment Analysis (ABSA) stands as a crucial task in predicting the sentiment polarity associated with identified aspects within text.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Domain Generalization via Causal Adjustment for Cross-Domain Sentiment Analysis

no code implementations22 Feb 2024 Siyin Wang, Jie zhou, Qin Chen, Qi Zhang, Tao Gui, Xuanjing Huang

Domain adaption has been widely adapted for cross-domain sentiment analysis to transfer knowledge from the source domain to the target domain.

Domain Generalization Sentiment Analysis

LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition

no code implementations22 Feb 2024 Junjie Ye, Nuo Xu, Yikun Wang, Jie zhou, Qi Zhang, Tao Gui, Xuanjing Huang

To overcome the limitations of existing data augmentation methods that compromise semantic integrity and address the uncertainty inherent in LLM-generated text, we leverage the distinctive characteristics of the NER task by augmenting the original data at both the contextual and entity levels.

Data Augmentation few-shot-ner +5

Fine-Grained Modeling of Narrative Context: A Coherence Perspective via Retrospective Questions

no code implementations21 Feb 2024 Liyan Xu, Jiangnan Li, Mo Yu, Jie zhou

This work introduces an original and practical paradigm for narrative comprehension, stemming from the characteristics that individual passages within narratives tend to be more cohesively related than isolated.

Retrieval

3D Vascular Segmentation Supervised by 2D Annotation of Maximum Intensity Projection

1 code implementation19 Feb 2024 Zhanqiang Guo, Zimeng Tan, Jianjiang Feng, Jie zhou

To alleviate this issue, we employ maximum intensity projection (MIP) to decrease the dimensionality of 3D volume to 2D image for efficient annotation, and the 2D labels are utilized to provide guidance and oversight for training 3D vessel segmentation model.

Organ Segmentation Segmentation

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

1 code implementation17 Feb 2024 Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie zhou, Xu sun

In this work, we take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.

Backdoor Attack backdoor defense +1

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

1 code implementation14 Feb 2024 Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.

Language Modeling Language Modelling

Previously on the Stories: Recap Snippet Identification for Story Reading

no code implementations11 Feb 2024 Jiangnan Li, Qiujing Wang, Liyan Xu, Wenjie Pang, Mo Yu, Zheng Lin, Weiping Wang, Jie zhou

Similar to the "previously-on" scenes in TV shows, recaps can help book reading by recalling the readers' memory about the important elements in previous texts to better understand the ongoing plot.

NPSVC++: Nonparallel Classifiers Encounter Representation Learning

1 code implementation8 Feb 2024 Junhong Zhang, Zhihui Lai, Jie zhou, Guangfei Liang

This paper focuses on a specific family of classifiers called nonparallel support vector classifiers (NPSVCs).

Representation Learning

On Prompt-Driven Safeguarding for Large Language Models

2 code implementations31 Jan 2024 Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

In this work, we investigate how LLMs' behavior (i. e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation.

Path Choice Matters for Clear Attribution in Path Methods

1 code implementation19 Jan 2024 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

Rigorousness and clarity are both essential for interpretations of DNNs to engender human trust.

Generative Multi-Modal Knowledge Retrieval with Large Language Models

1 code implementation16 Jan 2024 Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, BoWen Zhou, Jie zhou

Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications.

Retrieval

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

2 code implementations CVPR 2024 Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.

Image Classification Image Generation +1

Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object Detection

no code implementations20 Dec 2023 Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, Jie zhou

The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments.

Data Augmentation object-detection +1

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

1 code implementation19 Dec 2023 Lang Yu, Qin Chen, Jie zhou, Liang He

Large language models (LLMs) have shown great success in various Natural Language Processing (NLP) tasks, whist they still need updates after deployment to fix errors or keep pace with the changing knowledge in the world.

Document Classification Hallucination +2

A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis

no code implementations16 Dec 2023 Jingyi Zhou, Jie zhou, Jiabao Zhao, Siyin Wang, Haijun Shan, Gui Tao, Qi Zhang, Xuanjing Huang

Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields.

Contrastive Learning Few-Shot Text Classification +4

Mathematical Language Models: A Survey

no code implementations12 Dec 2023 Wentao Liu, Hanglei Hu, Jie zhou, Yuyang Ding, Junsong Li, Jiayi Zeng, Mengliang He, Qin Chen, Bo Jiang, Aimin Zhou, Liang He

In recent years, there has been remarkable progress in leveraging Language Models (LMs), encompassing Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs), within the domain of mathematics.

Survey

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation

no code implementations11 Dec 2023 Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Yifan Chen, Jianjiang Feng, Jie zhou

Several methods have been proposed to estimate 3D human pose from multi-view images, achieving satisfactory performance on public datasets collected under relatively simple conditions.

3D Human Pose Estimation Unsupervised Domain Adaptation

HumanReg: Self-supervised Non-rigid Registration of Human Point Cloud

1 code implementation9 Dec 2023 Yifan Chen, Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Jianjiang Feng, Jie zhou

In this paper, we present a novel registration framework, HumanReg, that learns a non-rigid transformation between two human point clouds end-to-end.

Fixed-length Dense Descriptor for Efficient Fingerprint Matching

no code implementations30 Nov 2023 Zhiyu Pan, Yongjie Duan, Jianjiang Feng, Jie zhou

In fingerprint matching, fixed-length descriptors generally offer greater efficiency compared to minutiae set, but the recognition accuracy is not as good as that of the latter.

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

1 code implementation CVPR 2024 Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie zhou, Jiwen Lu

Our SelfOcc outperforms the previous best method SceneRF by 58. 7% using a single frame as input on SemanticKITTI and is the first self-supervised work that produces reasonable 3D occupancy for surround cameras on nuScenes.

Autonomous Driving Monocular Depth Estimation +1

LDConv: Linear deformable convolution for improving convolutional neural networks

2 code implementations20 Nov 2023 Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie zhou, Liming Zhang

In response to the above questions, the Linear Deformable Convolution (LDConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance.

object-detection Object Detection

MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

1 code implementation15 Nov 2023 Xiaozhi Wang, Hao Peng, Yong Guan, Kaisheng Zeng, Jianhui Chen, Lei Hou, Xu Han, Yankai Lin, Zhiyuan Liu, Ruobing Xie, Jie zhou, Juanzi Li

Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships.

All Event Argument Extraction +4

Distilling Rule-based Knowledge into Large Language Models

1 code implementation15 Nov 2023 Wenkai Yang, Yankai Lin, Jie zhou, Ji-Rong Wen

The current paradigm of knowledge learning for LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples.

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

no code implementations14 Nov 2023 Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases.

counterfactual Knowledge Graphs +2

Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction

1 code implementation14 Nov 2023 Kunting Li, Yong Hu, Shaolei Wang, Hanhan Ma, Liang He, Fandong Meng, Jie zhou

However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics.

Semantic Similarity Semantic Textual Similarity +1

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

no code implementations8 Nov 2023 Zhen Yang, Yingxue Zhang, Fandong Meng, Jie zhou

Specifically, for the input from any modality, TEAL first discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix.

All

Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding

1 code implementation6 Nov 2023 Jiali Zeng, Fandong Meng, Yongjing Yin, Jie zhou

Contemporary translation engines based on the encoder-decoder framework have made significant strides in development.

Decoder Machine Translation +2

Plot Retrieval as an Assessment of Abstract Semantic Association

no code implementations3 Nov 2023 Shicheng Xu, Liang Pang, Jiangnan Li, Mo Yu, Fandong Meng, HuaWei Shen, Xueqi Cheng, Jie zhou

Readers usually only give an abstract and vague description as the query based on their own understanding, summaries, or speculations of the plot, which requires the retrieval model to have a strong ability to estimate the abstract semantic associations between the query and candidate plots.

Information Retrieval Retrieval

Universal Multi-modal Multi-domain Pre-trained Recommendation

no code implementations3 Nov 2023 Wenqi Sun, Ruobing Xie, Shuqing Bian, Wayne Xin Zhao, Jie zhou

There is a rapidly-growing research interest in modeling user preferences via pre-training multi-domain interactions for recommender systems.

Recommendation Systems

Fast Shapley Value Estimation: A Unified Approach

1 code implementation2 Nov 2023 Borui Zhang, Baotong Tian, Wenzhao Zheng, Jie zhou, Jiwen Lu

Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks.

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

1 code implementation NeurIPS 2023 Yinan Liang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI.

Image Classification

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

1 code implementation24 Oct 2023 Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs.

Computational Efficiency

Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

no code implementations20 Oct 2023 Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan YAO, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie zhou

With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR).

Informativeness Language Modeling +2

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

no code implementations19 Oct 2023 Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise.

DCRNN: A Deep Cross approach based on RNN for Partial Parameter Sharing in Multi-task Learning

no code implementations18 Oct 2023 Jie zhou, Qian Yu

The model has three innovations: 1) It adopts the idea of cross network and uses RNN network to cross-process the features, thereby effectively improves the expressive ability of the model; 2) It innovatively proposes the structure of partial parameter sharing; 3) It can effectively capture the potential correlation between different tasks to optimize the efficiency and methods for learning different tasks.

Multi-Task Learning Recommendation Systems

XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners

1 code implementation9 Oct 2023 Yun Luo, Zhen Yang, Fandong Meng, Yingjie Li, Fang Guo, Qinglin Qi, Jie zhou, Yue Zhang

Active learning (AL), which aims to construct an effective training set by iteratively curating the most formative unlabeled data for annotation, has been widely used in low-resource tasks.

Active Learning Decoder +2

C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network

no code implementations9 Oct 2023 Ruizhi Wang, Xiangtao Wang, Jie zhou, Thomas Lukasiewicz, Zhenghua Xu

In addition, word-level optimization based on numbers ignores the semantics of reports and medical images, and the generated reports often cannot achieve good performance.

Contrastive Learning Medical Report Generation

Enhancing Argument Structure Extraction with Efficient Leverage of Contextual Information

1 code implementation8 Oct 2023 Yun Luo, Zhen Yang, Fandong Meng, Yingjie Li, Jie zhou, Yue Zhang

However, we observe that merely concatenating sentences in a contextual window does not fully utilize contextual information and can sometimes lead to excessive attention on less informative sentences.

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

1 code implementation ICCV 2023 Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou

By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space.

Cannot find the paper you are looking for? You can Submit a new open access paper.