Search Results for author: Jing Liu

Found 324 papers, 116 papers with code

\textrm{DuReader}_{\textrm{vis}}: A Chinese Dataset for Open-domain Document Visual Question Answering

1 code implementation Findings (ACL) 2022 Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e. g., web pages, PDFs, or Word documents) as the information source.

document understanding Open-Domain Question Answering +1

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction +1

基于相似度进行句子选择的机器阅读理解数据增强(Machine reading comprehension data Augmentation for sentence selection based on similarity)

no code implementations CCL 2022 Shuang Nie, Zheng Ye, Jun Qin, Jing Liu

“目前常见的机器阅读理解数据增强方法如回译, 单独对文章或者问题进行数据增强, 没有考虑文章、问题和选项三元组之间的联系。因此, 本文探索了一种利用三元组联系进行文章句子筛选的数据增强方法, 通过比较文章与问题以及选项的相似度, 选取文章中与二者联系紧密的句子。同时为了使不同选项的三元组区别增大, 我们选用了正则化Dropout的策略。实验结果表明, 在RACE数据集上的准确率可提高3. 8%。”

Data Augmentation Machine Reading Comprehension +1

Deep Transferring Quantization

1 code implementation ECCV 2020 Zheng Xie, Zhiquan Wen, Jing Liu, Zhi-Qiang Liu, Xixian Wu, Mingkui Tan

Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model.

Face Recognition image-classification +3

LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation

no code implementations20 Jun 2025 Tongtian Yue, Longteng Guo, Yepeng Tang, Zijia Zhao, Xinxin Zhu, Hua Huang, Jing Liu

Despite the impressive advancements of Large Vision-Language Models (LVLMs), existing approaches suffer from a fundamental bottleneck: inefficient visual-language integration.

Abstract Sound Fusion with Unconditioned Inversion Model

no code implementations13 Jun 2025 Jing Liu, EnQi Lian

An abstract sound is defined as a sound that does not disclose identifiable real-world sound events to a listener.

model

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

no code implementations11 Jun 2025 Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices.

Model Compression Quantization

Uniqueness of phase retrieval from offset linear canonical transform

no code implementations4 Jun 2025 Jing Liu, Haiye Huo

The classical phase retrieval refers to the recovery of an unknown signal from its Fourier magnitudes, which is widely used in fields such as quantum mechanics, signal processing, optics, etc.

Retrieval

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

no code implementations2 Jun 2025 Youze Wang, WenBo Hu, Yinpeng Dong, Jing Liu, Hanwang Zhang, Richang Hong

Large Language Models (LLMs) have evolved into Multimodal Large Language Models (MLLMs), significantly enhancing their capabilities by integrating visual information and other types, thus aligning more closely with the nature of human intelligence, which processes a variety of data forms beyond just text.

Safety Alignment

FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments

no code implementations26 May 2025 Saba Tabatabaee, Jing Liu, Carol Espy-Wilson

Creating Speaker Verification (SV) systems for classroom settings that are robust to classroom noises such as babble noise is crucial for the development of AI tools that assist educational environments.

Speaker Verification

Context-Driven Dynamic Pruning for Large Speech Foundation Models

no code implementations24 May 2025 Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, Shinji Watanabe

Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference.

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

no code implementations24 May 2025 Toshiaki Koike-Akino, Jing Liu, Ye Wang

To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced.

Mixture-of-Experts

LatentLLM: Attention-Aware Joint Tensor Compression

no code implementations23 May 2025 Toshiaki Koike-Akino, Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand

Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.

Model Compression Tensor Decomposition

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

no code implementations20 May 2025 Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Espy-Wilson

However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Scaling Law for Quantization-Aware Training

3 code implementations20 May 2025 Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng, Zeyue Xue, Zhiheng Liu, Yunshui Li, Jin Ma, Jie Huang, Xun Zhou, Ping Luo

Through 268 QAT experiments, we show that quantization error decreases as model size increases, but rises with more training tokens and coarser quantization granularity.

Quantization

Emergent Specialization: Rare Token Neurons in Language Models

no code implementations19 May 2025 Jing Liu, Haozheng Wang, Yueheng Li

Large language models struggle with representing and generating rare tokens despite their importance in specialized domains.

Model Merging in Pre-training of Large Language Models

no code implementations17 May 2025 Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, Lingjun Liu, Bole Ma, Xiaoying Jia, Zhou Xun, Siyuan Qiao, Liang Xiang, Yonghui Wu

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored.

Mixture-of-Experts

Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation

no code implementations17 May 2025 Yuhao Wang, Ruiyang Ren, Yucheng Wang, Wayne Xin Zhao, Jing Liu, Hua Wu, Haifeng Wang

In this paper, we present a systematic investigation of the intrinsic mechanisms by which LLMs integrate internal (parametric) and external (retrieved) knowledge in RAG scenarios.

Open-Domain Question Answering RAG +2

QVGen: Pushing the Limit of Quantized Video Generative Models

no code implementations16 May 2025 Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT.

Quantization

End-to-End Vision Tokenizer Tuning

no code implementations15 May 2025 Wenxuan Wang, Fan Zhang, Yufeng Cui, Haiwen Diao, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang

To address this, we propose ETT, an end-to-end vision tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.

Image Generation Question Answering +1

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

no code implementations5 May 2025 Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints.

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey

no code implementations3 May 2025 Jing Liu, Yao Du, Kun Yang, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C. M. Leung

Furthermore, the review identifies critical research directions including LLMs deployment, 6G integration, neuromorphic computing, and quantum computing, offering a roadmap for addressing persistent challenges in heterogeneity management, real-time processing, and scalability.

Autonomous Driving Benchmarking +5

ZipR1: Reinforcing Token Sparsity in MLLMs

no code implementations23 Apr 2025 Feng Chen, Yefei He, Lequan Lin, Jing Liu, Bohan Zhuang, Qi Wu

Sparse attention mechanisms aim to reduce computational overhead by selectively processing a subset of salient tokens while preserving model performance.

Token Reduction

Image Difference Grounding with Natural Language

no code implementations2 Apr 2025 Wenxuan Wang, Zijia Zhao, Yisi Zhang, Yepeng Tang, Erdong Hu, Xinlong Wang, Jing Liu

We introduce DiffGround, a large-scale and high-quality dataset for IDG, containing image pairs with diverse visual variations along with instructions querying fine-grained differences.

Visual Grounding

COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation

no code implementations31 Mar 2025 Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Zike Yan, Qi Wu, Zhihua Wei, Jing Liu

RSS facilitates comprehensive inter-modal interactions within a single scan, while the CS3 module adapts the selective state space module into a dual-stream architecture, thereby enhancing the acquisition of cross-modal interactions.

Memorization Vision and Language Navigation

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis

no code implementations27 Mar 2025 Zhiwei Yang, Chen Gao, Jing Liu, Peng Wu, Guansong Pang, Mike Zheng Shou

To bridge this gap and facilitate the practical deployment of LLM-based VAD, we introduce AssistPDA, the first online video anomaly surveillance assistant that unifies video anomaly prediction, detection, and analysis (VAPDA) within a single framework.

Anomaly Detection Anomaly Forecasting +2

Adaptive Weighted Parameter Fusion with CLIP for Class-Incremental Learning

no code implementations25 Mar 2025 Juncen Guo, Xiaoguang Zhu, Liangyu Teng, Hao Yang, Jing Liu, Yang Liu, Liang Song

Class-incremental Learning (CIL) enables the model to incrementally absorb knowledge from new classes and build a generic classifier across all previously encountered classes.

class-incremental learning Class Incremental Learning +1

CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

no code implementations24 Mar 2025 Yang Liu, Hongjin Wang, Zepu Wang, Xiaoguang Zhu, Jing Liu, Peng Sun, Rui Tang, Jianwei Du, Victor C. M. Leung, Liang Song

Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection.

Anomaly Detection In Surveillance Videos Representation Learning +1

Breaking the Encoder Barrier for Seamless Video-Language Understanding

no code implementations24 Mar 2025 Handong Li, Yiyuan Zhang, Longteng Guo, Xiangyu Yue, Jing Liu

Most Video-Large Language Models (Video-LLMs) adopt an encoder-decoder framework, where a vision encoder extracts frame-wise features for processing by a language model.

Decoder Language Modeling +3

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

no code implementations18 Mar 2025 Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Longteng Guo, Zhihua Wei, Jing Liu

In this paper, we propose FlexVLN, an innovative hierarchical approach to VLN that integrates the fundamental navigation ability of a supervised-learning-based Instruction Follower with the robust generalization ability of the LLM Planner, enabling effective generalization across diverse VLN datasets.

Vision and Language Navigation

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

1 code implementation CVPR 2025 Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, Siyu Zhou, Qian He, Jing Liu

To address these issues, we introduce Auto-Regressive Diffusion (AR-Diffusion), a novel model that combines the strengths of auto-regressive and diffusion models for flexible, asynchronous video generation.

Video Generation

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents

no code implementations26 Feb 2025 Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models.

Hallucination Knowledge Distillation +2

Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation

no code implementations21 Feb 2025 Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, XiaoYu Zhang, Jing Liu

Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.

Time Series Time Series Analysis

VRoPE: Rotary Position Embedding for Video Large Language Models

1 code implementation17 Feb 2025 Zikang Liu, Longteng Guo, Yepeng Tang, Tongtian Yue, Junxian Cai, Kai Ma, Qingbin Liu, Xi Chen, Jing Liu

Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames.

Position Video Understanding

Smoothed Embeddings for Robust Language Models

no code implementations27 Jan 2025 Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

Improving the safety and reliability of large language models (LLMs) is a crucial aspect of realizing trustworthy AI systems.

A Survey on Diffusion Models for Anomaly Detection

1 code implementation20 Jan 2025 Jing Liu, Zhenchao Ma, Zepu Wang, Yang Liu, Zehua Wang, Peng Sun, Liang Song, Bo Hu, Azzedine Boukerche, Victor C. M. Leung

Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing.

Anomaly Detection Computational Efficiency +2

Can LLM Generate Regression Tests for Software Commits?

no code implementations19 Jan 2025 Jing Liu, Seongmin Lee, Eleonora Losiouk, Marcel Böhme

For programs with more compact file formats, like PDF, as expected, it struggled to generate effective test cases.

regression

Few-Shot Learner Generalizes Across AI-Generated Image Detection

no code implementations15 Jan 2025 Shiyu Wu, Jing Liu, Jing Li, Yequan Wang

Current fake image detectors trained on large synthetic image datasets perform satisfactorily on limited studied generative models.

Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets

no code implementations7 Jan 2025 Jing Liu, Duanchu Wang, Haoran Gong, Chongyu Wang, Jihua Zhu, Di Wang

The Boreal3D dataset, and more broadly, the synthetic data augmentation framework, is poised to become a critical resource for advancing research in large-scale 3D forest scene understanding and structural parameter estimation.

Data Augmentation parameter estimation +2

Efficient Motion-Aware Video MLLM

no code implementations CVPR 2025 Zijia Zhao, Yuqi Huo, Tongtian Yue, Longteng Guo, Haoyu Lu, Bingning Wang, WeiPeng Chen, Jing Liu

Most current video MLLMs rely on uniform frame sampling and image-level encoders, resulting in inefficient data processing and limited motion awareness.

Question Answering Video Question Answering +1

FedCross: Intertemporal Federated Learning Under Evolutionary Games

no code implementations22 Dec 2024 Jianfeng Lu, Ying Zhang, Riheng Jia, Shuqin Cao, Jing Liu, Hao Fu

Federated Learning (FL) mitigates privacy leakage in decentralized machine learning by allowing multiple clients to train collaboratively locally.

Federated Learning

Channel Merging: Preserving Specialization for Merged Experts

no code implementations18 Dec 2024 Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang

To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference.

Code Generation Mathematical Reasoning

Numerical Pruning for Efficient Autoregressive Models

no code implementations17 Dec 2024 Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.

Decoder Image Generation

AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

1 code implementation17 Dec 2024 Shibing Mo, Kai Wu, Qixuan Gao, Xiangyi Teng, Jing Liu

This challenge has led to the manual design of GNNs tailored to specific graph types, but these approaches are limited by the high cost of labor and the constraints of expert knowledge, which cannot keep up with the rapid growth of graph data.

Neural Architecture Search

Two Layer Walk: A Community-Aware Graph Embedding

1 code implementation17 Dec 2024 He Yu, Jing Liu

Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns.

Evolutionary Algorithms Graph Embedding +1

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era

1 code implementation13 Dec 2024 Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Xinwen Zhang, Xingyu Zheng, Jixuan Xu, Yue Zhang, Jinlong Hou, Huyang Sun

In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark.

Image to Video Generation

Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

no code implementations12 Dec 2024 Jing Liu, Abdellah Fourtassi

LLMs can generate human-like dialogues, yet their ability to simulate early child-adult interactions remains largely unexplored.

Benchmarking Diversity

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

no code implementations5 Dec 2024 Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks.

Self-Supervised Learning

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

no code implementations CVPR 2025 Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Yuqian Zhang, Linjie Luo, Bo Yuan

While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates.

Quantization

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

no code implementations22 Nov 2024 Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu

To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.

Privacy-Preserving Video Anomaly Detection: A Survey

no code implementations21 Nov 2024 Jing Liu, Yang Liu, Xiaoguang Zhu

Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems, making Privacy-Preserving Video Anomaly Detection (P2VAD) a hotspot in the AI community.

Anomaly Detection Ethics +3

ID-Patch: Robust ID Association for Group Photo Personalization

no code implementations CVPR 2025 Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo

Existing methods suffer from limitations such as the reliance on segmentation models, increased runtime, or a high probability of ID leakage.

Self-Calibrated Listwise Reranking with Large Language Models

no code implementations7 Nov 2024 Ruiyang Ren, Yuhao Wang, Kun Zhou, Wayne Xin Zhao, Wenjie Wang, Jing Liu, Ji-Rong Wen, Tat-Seng Chua

Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach.

Reranking

Quantum Diffusion Models for Few-Shot Learning

no code implementations6 Nov 2024 Ruhan Wang, Ye Wang, Jing Liu, Toshiaki Koike-Akino

Modern quantum machine learning (QML) methods involve the variational optimization of parameterized quantum circuits on training datasets, followed by predictions on testing datasets.

Denoising Few-Shot Learning +1

Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms

no code implementations28 Oct 2024 He Yu, Jing Liu

Since this synergy enables a more efficient and creative search process, we first conduct an extensive review of recent research on the application of LLMs in optimization.

Evolutionary Algorithms

AutoRNet: Automatically Optimizing Heuristics for Robust Network Design via Large Language Models

no code implementations23 Oct 2024 He Yu, Jing Liu

Achieving robust networks is a challenging problem due to its NP-hard nature and complex solution space.

Diversity Evolutionary Algorithms

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

no code implementations14 Oct 2024 Tongtian Yue, Longteng Guo, Jie Cheng, Xuange Gao, Jing Liu

In this paper, we propose a novel Ada-K routing strategy that dynamically adjusts the number of activated experts for each token, thereby improving the balance between computational efficiency and model performance.

Computational Efficiency Mixture-of-Experts

EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training

no code implementations14 Oct 2024 Tongtian Yue, Shuning Xue, Xuange Gao, Yepeng Tang, Longteng Guo, Jie Jiang, Jing Liu

First, we propose an electrode-wise modeling strategy that treats each electrode as a fundamental unit, enabling the integration of diverse EEG datasets collected from up to 138 electrodes, amassing 37. 5M pre-training samples.

EEG Transfer Learning

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification

no code implementations11 Oct 2024 Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang

The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos.

MME Quantization +1

Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher

no code implementations5 Oct 2024 Yong Guo, Shulian Zhang, Haolin Pan, Jing Liu, Yulun Zhang, Jian Chen

To address this, we propose a Gap Preserving Distillation (GPD) method that trains an additional dynamic teacher model from scratch along with training the student to bridge this gap.

Knowledge Distillation

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation

no code implementations2 Oct 2024 Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu

To prevent redundant modeling of common video signals, we propose a novel diffusion-based framework, named COMUNI, which decomposes the COMmon and UNIque video signals to enable efficient video generation.

Decoder Position +1

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

1 code implementation2 Oct 2024 Mingzhen Sun, Weining Wang, Yanyuan Qiao, Jiahui Sun, Zihan Qin, Longteng Guo, Xinxin Zhu, Jing Liu

Sounding Video Generation (SVG) is an audio-video joint generation task challenged by high-dimensional signal spaces, distinct data formats, and different patterns of content information.

Video Generation

Design and validation of a fuzzy logic controller for multi-section continuum robots

no code implementations30 Sep 2024 Jing Liu, Tianyi Zeng, Abdelkhalick Mohammad, Xin Dong, Dragos Axinte

This paper introduces a simple-structured, model-less fuzzy logic controller for the closed-loop control of continuum robots.

Navigate

MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation

no code implementations27 Sep 2024 Junyou Zhu, Yanyuan Qiao, Siqi Zhang, Xingjian He, Qi Wu, Jing Liu

In recent years, Embodied Artificial Intelligence (Embodied AI) has advanced rapidly, yet the increasing size of models conflicts with the limited computational capabilities of Embodied AI platforms.

Knowledge Distillation Vision and Language Navigation

RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

no code implementations25 Sep 2024 Yihong Tang, Bo wang, Xu Wang, Dongming Zhao, Jing Liu, Jijun Zhang, Ruifang He, Yuexian Hou

Role-playing systems powered by large language models (LLMs) have become increasingly influential in emotional communication applications.

Hallucination

M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images

1 code implementation23 Sep 2024 Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images.

regression

End-Cloud Collaboration Framework for Advanced AI Customer Service in E-commerce

no code implementations20 Sep 2024 Liangyu Teng, Yang Liu, Jing Liu, Liang Song

Specifically, the large cloud model acts as a teacher, guiding and promoting the learning of the end model, which significantly reduces the end model's reliance on large-scale, high-quality data and thereby addresses the data bottleneck in traditional end model training, offering a new paradigm for the rapid deployment of industry applications.

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments

no code implementations13 Sep 2024 Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cross Fusion RGB-T Tracking with Bi-directional Adapter

no code implementations30 Aug 2024 Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu

To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation of multiple modalities in tracking while dynamically fusing temporal information.

Rgb-T Tracking

Analyzing Inference Privacy Risks Through Gradients in Machine Learning

no code implementations29 Aug 2024 Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Bradley Malin, Ye Wang

While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients.

Attribute

A Survey on Facial Expression Recognition of Static and Dynamic Emotions

1 code implementation28 Aug 2024 Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies.

cross-modal alignment Facial Expression Recognition +1

Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection

no code implementations23 Aug 2024 Niklas Risse, Jing Liu, Marcel Böhme

We call a function "vulnerable" if it was involved in a patch of an actual security flaw and confirmed to cause the program's vulnerability.

Benchmarking Binary Classification +1

The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution

no code implementations20 Aug 2024 Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu

Referring Video Object Segmentation is an emerging multi-modal task that aims to segment objects in the video given a natural language expression.

Referring Video Object Segmentation Retrieval +2

Learning Feature-Preserving Portrait Editing from Generated Pairs

no code implementations29 Jul 2024 Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity.

Diffusion Feedback Helps CLIP See Better

1 code implementation29 Jul 2024 Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e. g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks.

image-classification Image Classification

Temporal Feature Matters: A Framework for Diffusion Model Quantization

1 code implementation28 Jul 2024 Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, DaCheng Tao

However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising.

Denoising Image Generation +1

Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

no code implementations16 Jul 2024 Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons

Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models.

Adversarial Robustness

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

no code implementations15 Jul 2024 Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors.

In-Context Learning

Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals

no code implementations15 Jul 2024 Keshav Bimbraw, Jing Liu, Ye Wang, Toshiaki Koike-Akino

Notably, the proposed method is also robust to an increase in the number of missing channels compared to other methods.

Classification Imputation

Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

no code implementations12 Jul 2024 Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources.

Transfer Learning

OneDiff: A Generalist Model for Image Difference Captioning

no code implementations8 Jul 2024 Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu

This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module.

Language Modelling model +1

ShapG: new feature importance method based on the Shapley value

1 code implementation29 Jun 2024 Chi Zhao, Jing Liu, Elena Parilina

In this paper, we proposed a new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) for measuring feature importance.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +1

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

no code implementations20 Jun 2024 Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu

Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions.

Instance Segmentation Referring Video Object Segmentation +5

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

1 code implementation13 Jun 2024 Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, WeiPeng Chen, Jing Liu

In this paper, we propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation.

Benchmarking Video Generation +2

ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

no code implementations13 Jun 2024 Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

LLM development involves pre-training a foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts.

Code Generation domain classification +3

Explore the Limits of Omni-modal Pretraining at Scale

1 code implementation13 Jun 2024 Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue

We propose to build omni-modal intelligence, which is capable of understanding any modality and learning universal representations.

Language Modeling Language Modelling +6

Efficient Differentially Private Fine-Tuning of Diffusion Models

no code implementations7 Jun 2024 Jing Liu, Andrew Lowy, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples.

parameter-efficient fine-tuning

OUS: Scene-Guided Dynamic Facial Expression Recognition

no code implementations29 May 2024 Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, YaTing Li, Wenqiang Zhang

By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

Dynamic Facial Expression Recognition Facial Expression Recognition

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

no code implementations23 May 2024 Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang

In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference.

Quantization

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

1 code implementation23 May 2024 Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang

In terms of efficiency, ZipCache also showcases a $37. 3\%$ reduction in prefill-phase latency, a $56. 9\%$ reduction in decoding-phase latency, and a $19. 8\%$ reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of $4096$.

GSM8K Quantization

Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation

no code implementations18 May 2024 Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, Jing Liu

Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation.

Decoder Image Segmentation +2

Networking Systems for Video Anomaly Detection: A Tutorial and Survey

1 code implementation16 May 2024 Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Liang Cao, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung

The increasing utilization of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community.

Anomaly Detection Edge-computing +2

Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

no code implementations15 May 2024 Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Pretrained Optimization Model for Zero-Shot Black Box Optimization

1 code implementation6 May 2024 XiaoBin Li, Kai Wu, Yujian Betterest Li, XiaoYu Zhang, Handing Wang, Jing Liu

Zero-shot optimization involves optimizing a target task that was not seen during training, aiming to provide the optimal solution without or with minimal adjustments to the optimizer.

Evolutionary Algorithms

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

1 code implementation22 Apr 2024 Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu

Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions.

Language Modeling Language Modelling +7

Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

no code implementations12 Apr 2024 Yichen Yan, Xingjian He, Sihan Chen, Jing Liu

In this paper, we introduce CRFormer, a model that iteratively calibrates multi-modal features in the transformer decoder.

Decoder Image Segmentation +1

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

no code implementations CVPR 2024 Zhiwei Yang, Jing Liu, Peng Wu

Further, we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames.

Anomaly Detection Domain Adaptation +2

The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

no code implementations3 Apr 2024 Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai

Assessing instruction quality is a fundamental component of any improvement efforts in the education system.

SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface

no code implementations27 Mar 2024 Jiahao Luo, Jing Liu, James Davis

Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions.

3D Reconstruction Face Reconstruction +1

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

1 code implementation CVPR 2024 Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu

In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process.

VL-Mamba: Exploring State Space Models for Multimodal Learning

no code implementations20 Mar 2024 Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu

The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.

Language Modeling Language Modelling +5

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

no code implementations18 Mar 2024 Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision.

Transfer Learning

Self-Evaluation of Large Language Model based on Glass-box Features

1 code implementation7 Mar 2024 Hui Huang, Yingqi Qu, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao, Wenpeng Lu

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods.

Language Modeling Language Modelling +1

Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

no code implementations6 Mar 2024 Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu

This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts.

SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

1 code implementation28 Feb 2024 Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao

To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.

Image Generation Language Modeling +1

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

1 code implementation27 Feb 2024 Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, Jing Liu, Ji-Rong Wen

By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents.

Open-Domain Question Answering RAG +2

BASES: Large-scale Web Search User Simulation with Large Language Model based Agents

no code implementations27 Feb 2024 Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang

Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation.

Information Retrieval Language Modeling +4

CCFC++: Enhancing Federated Clustering through Feature Decorrelation

no code implementations20 Feb 2024 Jie Yan, Jing Liu, Yi-Zi Ning, Zhong-Yuan Zhang

In federated clustering, multiple data-holding clients collaboratively group data without exchanging raw data.

Clustering Contrastive Learning

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

1 code implementation17 Feb 2024 Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu

To promote classic VG towards human intention interpretation, we propose a new intention-driven visual grounding (IVG) task and build a large-scale IVG dataset termed IntentionVG with free-form intention expressions.

Visual Grounding

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

no code implementations14 Feb 2024 Andrew Lowy, Zhuohang Li, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set.

Inference Attack Membership Inference Attack

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

1 code implementation19 Jan 2024 Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor.

regression

CCFC: Bridging Federated Clustering and Contrastive Learning

1 code implementation12 Jan 2024 Jie Yan, Jing Liu, Zhong-Yuan Zhang

Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases.

Clustering Contrastive Learning +1

Temporal Adaptive RGBT Tracking with Modality Prompt

no code implementations2 Jan 2024 Hongyu Wang, Xiaotao Liu, YiFan Li, Meng Sun, Dian Yuan, Jing Liu

RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving.

Autonomous Driving Rgb-T Tracking

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation

1 code implementation CVPR 2024 Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding our benchmark RefCOCOm the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES.

Descriptive Object +3

Signed Graph Neural Ordinary Differential Equation for Modeling Continuous-time Dynamics

1 code implementation18 Dec 2023 Lanlan Chen, Kai Wu, Jian Lou, Jing Liu

Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling.

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

1 code implementation13 Dec 2023 Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

Descriptive Object +3

Efficient Stitchable Task Adaptation

1 code implementation CVPR 2024 Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot parameter-efficient fine-tuning

BAND-2k: Banding Artifact Noticeable Database for Banding Detection and Quality Assessment

1 code implementation29 Nov 2023 Zijian Chen, Wei Sun, Jun Jia, Fangfang Lu, ZiCheng Zhang, Jing Liu, Ru Huang, Xiongkuo Min, Guangtao Zhai

The quality score of a banding image is generated by pooling the banding detection maps masked by the spatial frequency filters.

2k Image Quality Assessment +1

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

1 code implementation CVPR 2024 Yushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu

Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization.

Denoising Image Generation +1

Open-Vocabulary Video Anomaly Detection

no code implementations CVPR 2024 Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang

Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.

Anomaly Detection Weakly-supervised Video Anomaly Detection

An Interdisciplinary Outlook on Large Language Models for Scientific Research

no code implementations3 Nov 2023 James Boyko, Joseph Cohen, Nathan Fox, Maria Han Veiga, Jennifer I-Hsiu Li, Jing Liu, Bernardo Modenesi, Andreas H. Rauch, Kenneth N. Reid, Soumi Tribedi, Anastasia Visheratina, Xin Xie

In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision.

Med-DANet V2: A Flexible Dynamic Architecture for Efficient Medical Volumetric Segmentation

no code implementations28 Oct 2023 Haoran Shen, Yifu Zhang, Wenxuan Wang, Chen Chen, Jing Liu, Shanshan Song, Jiangyun Li

As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i. e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices.

Computational Efficiency MRI segmentation +2

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

no code implementations12 Oct 2023 Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus

Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects.

EEG Subject Transfer

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations12 Oct 2023 Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

1 code implementation5 Oct 2023 Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.

Denoising Image Generation +2

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

1 code implementation NeurIPS 2023 Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu

Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner.

Decoder Video Generation

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

no code implementations12 Sep 2023 Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

1 code implementation11 Sep 2023 Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng

Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts.

Text to Image Generation Text-to-Image Generation

Machine learning of network inference enhancement from noisy measurements

1 code implementation5 Sep 2023 Kai Wu, Yuanyuan Li, Jing Liu

Inferring networks from observed time series data presents a clear glimpse into the interconnections among nodes.

Time Series

FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

1 code implementation23 Aug 2023 Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani

The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model.

Action Unit Detection Cross-corpus +1

March in Chat: Interactive Prompting for Remote Embodied Referring Expression

1 code implementation ICCV 2023 Yanyuan Qiao, Yuankai Qi, Zheng Yu, Jing Liu, Qi Wu

Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.

Referring Expression Vision and Language Navigation

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

no code implementations18 Aug 2023 Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu

Our method harnesses the potential of the multi-modal features in the segmentation stage and aligns language features of different emphases with image features to achieve fine-grained text-to-pixel correlation.

Image Segmentation Referring Expression Segmentation +2

Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception

1 code implementation ICCV 2023 Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song

In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner.

3D Object Detection Autonomous Vehicles +1

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 code implementation24 Jul 2023 Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang

In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.

Anomaly Detection Retrieval +2

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

1 code implementation20 Jul 2023 Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang

In this study, we present the first analysis on the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain question answering (QA), with a bunch of important findings.

Open-Domain Question Answering Retrieval +1

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals

1 code implementation20 Jul 2023 Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai

Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc.

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

1 code implementation1 Jul 2023 Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai

In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023.

Image Quality Assessment Text to Image Generation +1

Stitched ViTs are Flexible Vision Backbones

1 code implementation30 Jun 2023 Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

1 code implementation15 Jun 2023 Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu

Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.

 Ranked #1 on TGIF-Frame on TGIF-QA (using extra training data)

Form model +8

Description-Enhanced Label Embedding Contrastive Learning for Text Classification

1 code implementation15 Jun 2023 Kun Zhang, Le Wu, Guangyi Lv, Enhong Chen, Shulan Ruan, Jing Liu, Zhiqiang Zhang, Jun Zhou, Meng Wang

Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.

Contrastive Learning Relation +4

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

2 code implementations NeurIPS 2023 Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).

 Ranked #1 on Image Captioning on COCO Captions (SPICE metric, using extra training data)

Audio captioning Audio-Visual Captioning +14

Rapid Plug-in Defenders

no code implementations27 May 2023 Kai Wu, Yujian Betterest Li, Jian Lou, XiaoYu Zhang, Handing Wang, Jing Liu

To address this challenge, this paper focuses on the Rapid Plug-in Defender (RaPiD) problem, aiming to rapidly counter adversarial perturbations without altering the deployed model.

Adversarial Purification

MMNet: Multi-Mask Network for Referring Image Segmentation

no code implementations24 May 2023 Yichen Yan, Xingjian He, Wenxuan Wan, Jing Liu

However, this task is challenging due to the distinct data properties between text and image, and the randomness introduced by diverse objects and unrestricted language expression.

Image Segmentation Segmentation +1

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Text Retrieval +5

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

no code implementations19 May 2023 Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li

Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression.

Image Segmentation Segmentation +1

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

1 code implementation19 May 2023 Zikang Liu, Sihan Chen, Longteng Guo, Handong Li, Xingjian He, Jing Liu

In this paper, we propose a novel method called Joint QA and DC GEneration (JADE), which utilizes a pre-trained multimodal model and easily-crawled image-text pairs to automatically generate and filter large-scale VQA and dense captioning datasets.

Dense Captioning Image Captioning +4

TOME: A Two-stage Approach for Model-based Retrieval

no code implementations18 May 2023 Ruiyang Ren, Wayne Xin Zhao, Jing Liu, Hua Wu, Ji-Rong Wen, Haifeng Wang

Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters.

Natural Questions Text Retrieval

PTQD: Accurate Post-Training Quantization for Diffusion Models

1 code implementation NeurIPS 2023 Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.

Denoising Image Generation +1

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

no code implementations12 May 2023 Kai Cheng, Xinhua Zeng, Yang Liu, Tian Wang, Chengxin Pang, Jing Teng, Zhaoyang Xia, Jing Liu

Since the anomaly set is complicated and unbounded, our STHA can adjust its detection ability to adapt to the human detection demands and the complexity degree of anomaly that happened in the history of a scene.

Anomaly Detection Human Detection +2

SLSG: Industrial Image Anomaly Detection by Learning Better Feature Embeddings and One-Class Classification

no code implementations30 Apr 2023 Minghui Yang, Jing Liu, Zhiwei Yang, Zhaoyang Wu

Focusing on more effective and comprehensive anomaly detection, we propose a network based on self-supervised learning and self-attentive graph convolution (SLSG) for anomaly detection.

Classification One-Class Classification +1

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

no code implementations24 Apr 2023 XiaoBin Li, Kai Wu, XiaoYu Zhang, Handing Wang, Jing Liu

To achieve this, 1) drawing on the mechanism of genetic algorithm, we propose a deep neural network framework called B2Opt, which has a stronger representation of optimization strategies based on survival of the fittest; 2) B2Opt can utilize the cheap surrogate functions of the target task to guide the design of the efficient optimization strategies.

Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation

no code implementations21 Apr 2023 Jiachen Shen, Wenxuan Wang, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, Jiangyun Li

Thus, it is of increasing importance to fine-tune pre-trained models for medical volumetric segmentation tasks in a both effective and parameter-efficient manner.

Segmentation Transfer Learning

DECN: Evolution Inspired Deep Convolution Network for Black-box Optimization

no code implementations19 Apr 2023 Kai Wu, XiaoBin Li, Penghui Liu, Jing Liu

We design a deep evolutionary convolution network (DECN) to realize the move from hand-designed EAs to automated EAs without manual interventions.

Evolutionary Algorithms Meta-Learning

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

1 code implementation17 Apr 2023 Jing Liu, Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang

Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.

 Ranked #1 on Video Captioning on VATEX (using extra training data)

Audio captioning Audio-Video Question Answering (AVQA) +17

SCMM: Calibrating Cross-modal Representations for Text-Based Person Search

no code implementations5 Apr 2023 Jing Liu, Donglai Wei, Yang Liu, Sipeng Zhang, Tong Yang, Victor C. M. Leung

This dual-pronged strategy enhances feature alignment and cross-modal correspondences, enabling accurate distinction of similar individuals while maintaining a streamlined dual-encoder architecture for real-time inference, which is essential for resource-limited sensors and IoT systems.

Person Search Text based Person Search

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

no code implementations30 Mar 2023 Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

1 code implementation29 Mar 2023 Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu

In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.

Audio Generation Contrastive Learning +2

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

no code implementations CVPR 2023 Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

no code implementations24 Mar 2023 Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo

Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.

Transfer Learning

Boosting Verified Training for Robust Image Classifications via Abstraction

1 code implementation CVPR 2023 Zhaodi Zhang, Zhiyi Xue, Yang Chen, Si Liu, Yueling Zhang, Jing Liu, Min Zhang

Via abstraction, all perturbed images are mapped into intervals before feeding into neural networks for training.

Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images

1 code implementation14 Mar 2023 ZiCheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, Guangtao Zhai

Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc.

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

2 code implementations CVPR 2023 Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu

Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.

Object Unconditional Video Generation +2

SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases

no code implementations28 Feb 2023 Yanchen Liu, Jing Yan, Yan Chen, Jing Liu, Hua Wu

Recent studies reveal that various biases exist in different NLP tasks, and over-reliance on biases results in models' poor generalization ability and low adversarial robustness.

Adversarial Robustness Natural Language Inference +1

Graph-based Knowledge Distillation: A survey and experimental evaluation

1 code implementation27 Feb 2023 Jing Liu, Tongya Zheng, Guanzheng Zhang, Qinfen Hao

It then provides a comprehensive summary of three types of Graph-based Knowledge Distillation methods, namely Graph-based Knowledge Distillation for deep neural networks (DKD), Graph-based Knowledge Distillation for GNNs (GKD), and Self-Knowledge Distillation based Graph-based Knowledge Distillation (SKD).

Self-Knowledge Distillation Survey

A novel efficient Multi-view traffic-related object detection framework

no code implementations23 Feb 2023 Kun Yang, Jing Liu, Dingkang Yang, Hanqi Wang, Peng Sun, Yanni Zhang, Yan Liu, Liang Song

With the rapid development of intelligent transportation system applications, a tremendous amount of multi-view video data has emerged to enhance vehicle perception.

input filtering Model Selection +2

Tag-based annotation creates better avatars

no code implementations14 Feb 2023 Minghao Liu, Zeyu Cheng, Shen Sang, Jing Liu, James Davis

Compared to direct annotation of labels, the proposed method: produces higher annotator agreements, causes machine learning to generates more consistent predictions, and only requires a marginal cost to add new rendering systems.

TAG

Fast Learnings of Coupled Nonnegative Tensor Decomposition Using Optimal Gradient and Low-rank Approximation

no code implementations10 Feb 2023 XiuLin Wang, Jing Liu, FengYu Cong

Tensor decomposition is a fundamental technique widely applied in signal processing, machine learning, and various other fields.

EEG Tensor Decomposition

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

1 code implementation10 Feb 2023 Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, Liang Song

Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos.

Anomaly Detection Event Detection +2

A Survey on Efficient Training of Transformers

no code implementations2 Feb 2023 Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

Survey

Discover governing differential equations from evolving systems

no code implementations19 Jan 2023 Yuanyuan Li, Kai Wu, Jing Liu

Our proposal is competitive in identifying the change points and discovering governing differential equations in three hybrid systems and two switching linear systems.

LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding

no code implementations ICCV 2023 Dan Liu, Jin Hou, Shaoli Huang, Jing Liu, Yuxin He, Bochuan Zheng, Jifeng Ning, Jingdong Zhang

To break the deadlock, we present LoTE-Animal, a large-scale endangered animal dataset collected over 12 years, to foster the application of deep learning in rare species conservation.

Action Recognition Domain Adaptation +5

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations ICCV 2023 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.