1 code implementation • Findings (ACL) 2022 • Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu
Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e. g., web pages, PDFs, or Word documents) as the information source.
no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann
Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.
no code implementations • CCL 2022 • Shuang Nie, Zheng Ye, Jun Qin, Jing Liu
“目前常见的机器阅读理解数据增强方法如回译, 单独对文章或者问题进行数据增强, 没有考虑文章、问题和选项三元组之间的联系。因此, 本文探索了一种利用三元组联系进行文章句子筛选的数据增强方法, 通过比较文章与问题以及选项的相似度, 选取文章中与二者联系紧密的句子。同时为了使不同选项的三元组区别增大, 我们选用了正则化Dropout的策略。实验结果表明, 在RACE数据集上的准确率可提高3. 8%。”
1 code implementation • ECCV 2020 • Zheng Xie, Zhiquan Wen, Jing Liu, Zhi-Qiang Liu, Xixian Wu, Mingkui Tan
Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model.
no code implementations • 20 Jun 2025 • Tongtian Yue, Longteng Guo, Yepeng Tang, Zijia Zhao, Xinxin Zhu, Hua Huang, Jing Liu
Despite the impressive advancements of Large Vision-Language Models (LVLMs), existing approaches suffer from a fundamental bottleneck: inefficient visual-language integration.
no code implementations • 13 Jun 2025 • Jing Liu, EnQi Lian
An abstract sound is defined as a sound that does not disclose identifiable real-world sound events to a listener.
no code implementations • 11 Jun 2025 • Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand
To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices.
no code implementations • 10 Jun 2025 • Ahmed Adel Attia, Jing Liu, Carl Espy-Wilson
The scarcity of large-scale classroom speech data has hindered the development of AI-driven speech models for education.
no code implementations • 4 Jun 2025 • Jing Liu, Haiye Huo
The classical phase retrieval refers to the recovery of an unknown signal from its Fourier magnitudes, which is widely used in fields such as quantum mechanics, signal processing, optics, etc.
no code implementations • 2 Jun 2025 • Youze Wang, WenBo Hu, Yinpeng Dong, Jing Liu, Hanwang Zhang, Richang Hong
Large Language Models (LLMs) have evolved into Multimodal Large Language Models (MLLMs), significantly enhancing their capabilities by integrating visual information and other types, thus aligning more closely with the nature of human intelligence, which processes a variety of data forms beyond just text.
no code implementations • 26 May 2025 • Saba Tabatabaee, Jing Liu, Carol Espy-Wilson
Creating Speaker Verification (SV) systems for classroom settings that are robust to classroom noises such as babble noise is crucial for the development of AI tools that assist educational environments.
no code implementations • 24 May 2025 • Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, Shinji Watanabe
Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference.
no code implementations • 24 May 2025 • Toshiaki Koike-Akino, Jing Liu, Ye Wang
To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced.
no code implementations • 23 May 2025 • Toshiaki Koike-Akino, Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand
Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.
no code implementations • 20 May 2025 • Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Espy-Wilson
However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
3 code implementations • 20 May 2025 • Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng, Zeyue Xue, Zhiheng Liu, Yunshui Li, Jin Ma, Jie Huang, Xun Zhou, Ping Luo
Through 268 QAT experiments, we show that quantization error decreases as model size increases, but rises with more training tokens and coarser quantization granularity.
no code implementations • 19 May 2025 • Jing Liu, Haozheng Wang, Yueheng Li
Large language models struggle with representing and generating rare tokens despite their importance in specialized domains.
no code implementations • 17 May 2025 • Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, Lingjun Liu, Bole Ma, Xiaoying Jia, Zhou Xun, Siyuan Qiao, Liang Xiang, Yonghui Wu
Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored.
no code implementations • 17 May 2025 • Yuhao Wang, Ruiyang Ren, Yucheng Wang, Wayne Xin Zhao, Jing Liu, Hua Wu, Haifeng Wang
In this paper, we present a systematic investigation of the intrinsic mechanisms by which LLMs integrate internal (parametric) and external (retrieved) knowledge in RAG scenarios.
no code implementations • 16 May 2025 • Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang
We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT.
no code implementations • 15 May 2025 • Wenxuan Wang, Fan Zhang, Yufeng Cui, Haiwen Diao, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang
To address this, we propose ETT, an end-to-end vision tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.
no code implementations • 5 May 2025 • Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang
The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints.
no code implementations • 3 May 2025 • Jing Liu, Yao Du, Kun Yang, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C. M. Leung
Furthermore, the review identifies critical research directions including LLMs deployment, 6G integration, neuromorphic computing, and quantum computing, offering a roadmap for addressing persistent challenges in heterogeneity management, real-time processing, and scalability.
no code implementations • 23 Apr 2025 • Feng Chen, Yefei He, Lequan Lin, Jing Liu, Bohan Zhuang, Qi Wu
Sparse attention mechanisms aim to reduce computational overhead by selectively processing a subset of salient tokens while preserving model performance.
no code implementations • 2 Apr 2025 • Wenxuan Wang, Zijia Zhao, Yisi Zhang, Yepeng Tang, Erdong Hu, Xinlong Wang, Jing Liu
We introduce DiffGround, a large-scale and high-quality dataset for IDG, containing image pairs with diverse visual variations along with instructions querying fine-grained differences.
no code implementations • 2 Apr 2025 • Jing Liu, Wenxuan Wang, Yisi Zhang, Yepeng Tang, Xingjian He, Longteng Guo, Tongtian Yue, Xinlong Wang
Referring expression segmentation (RES) aims at segmenting the entities' masks that match the descriptive language expression.
no code implementations • 31 Mar 2025 • Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Zike Yan, Qi Wu, Zhihua Wei, Jing Liu
RSS facilitates comprehensive inter-modal interactions within a single scan, while the CS3 module adapts the selective state space module into a dual-stream architecture, thereby enhancing the acquisition of cross-modal interactions.
no code implementations • 27 Mar 2025 • Zhiwei Yang, Chen Gao, Jing Liu, Peng Wu, Guansong Pang, Mike Zheng Shou
To bridge this gap and facilitate the practical deployment of LLM-based VAD, we introduce AssistPDA, the first online video anomaly surveillance assistant that unifies video anomaly prediction, detection, and analysis (VAPDA) within a single framework.
no code implementations • 25 Mar 2025 • Juncen Guo, Xiaoguang Zhu, Liangyu Teng, Hao Yang, Jing Liu, Yang Liu, Liang Song
Class-incremental Learning (CIL) enables the model to incrementally absorb knowledge from new classes and build a generic classifier across all previously encountered classes.
no code implementations • 24 Mar 2025 • Yang Liu, Hongjin Wang, Zepu Wang, Xiaoguang Zhu, Jing Liu, Peng Sun, Rui Tang, Jianwei Du, Victor C. M. Leung, Liang Song
Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection.
Anomaly Detection In Surveillance Videos
Representation Learning
+1
no code implementations • 24 Mar 2025 • Handong Li, Yiyuan Zhang, Longteng Guo, Xiangyu Yue, Jing Liu
Most Video-Large Language Models (Video-LLMs) adopt an encoder-decoder framework, where a vision encoder extracts frame-wise features for processing by a language model.
1 code implementation • CVPR 2025 • Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu
Monocular Depth Estimation (MDE) has emerged as a pivotal task in computer vision, supporting numerous real-world applications.
no code implementations • 18 Mar 2025 • Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Longteng Guo, Zhihua Wei, Jing Liu
In this paper, we propose FlexVLN, an innovative hierarchical approach to VLN that integrates the fundamental navigation ability of a supervised-learning-based Instruction Follower with the robust generalization ability of the LLM Planner, enabling effective generalization across diverse VLN datasets.
1 code implementation • CVPR 2025 • Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, Siyu Zhou, Qian He, Jing Liu
To address these issues, we introduce Auto-Regressive Diffusion (AR-Diffusion), a novel model that combines the strengths of auto-regressive and diffusion models for flexible, asynchronous video generation.
no code implementations • 26 Feb 2025 • Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models.
no code implementations • 21 Feb 2025 • Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, XiaoYu Zhang, Jing Liu
Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.
1 code implementation • 17 Feb 2025 • Zikang Liu, Longteng Guo, Yepeng Tang, Tongtian Yue, Junxian Cai, Kai Ma, Qingbin Liu, Xi Chen, Jing Liu
Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames.
no code implementations • 15 Feb 2025 • Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Hu, Jaap Jumelet, Tal Linzen, Jing Liu, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Wilcox, Adina Williams
We also call for papers outside the competition in any relevant areas.
no code implementations • 27 Jan 2025 • Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
Improving the safety and reliability of large language models (LLMs) is a crucial aspect of realizing trustworthy AI systems.
1 code implementation • 21 Jan 2025 • He Yu, Jing Liu
Dynamic graph representation learning plays a crucial role in understanding evolving behaviors.
1 code implementation • 20 Jan 2025 • Jing Liu, Zhenchao Ma, Zepu Wang, Yang Liu, Zehua Wang, Peng Sun, Liang Song, Bo Hu, Azzedine Boukerche, Victor C. M. Leung
Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing.
no code implementations • 19 Jan 2025 • Jing Liu, Seongmin Lee, Eleonora Losiouk, Marcel Böhme
For programs with more compact file formats, like PDF, as expected, it struggled to generate effective test cases.
no code implementations • 15 Jan 2025 • Shiyu Wu, Jing Liu, Jing Li, Yequan Wang
Current fake image detectors trained on large synthetic image datasets perform satisfactorily on limited studied generative models.
no code implementations • 7 Jan 2025 • Jing Liu, Duanchu Wang, Haoran Gong, Chongyu Wang, Jihua Zhu, Di Wang
The Boreal3D dataset, and more broadly, the synthetic data augmentation framework, is poised to become a critical resource for advancing research in large-scale 3D forest scene understanding and structural parameter estimation.
no code implementations • CVPR 2025 • Zijia Zhao, Yuqi Huo, Tongtian Yue, Longteng Guo, Haoyu Lu, Bingning Wang, WeiPeng Chen, Jing Liu
Most current video MLLMs rely on uniform frame sampling and image-level encoders, resulting in inefficient data processing and limited motion awareness.
no code implementations • 22 Dec 2024 • Jianfeng Lu, Ying Zhang, Riheng Jia, Shuqin Cao, Jing Liu, Hao Fu
Federated Learning (FL) mitigates privacy leakage in decentralized machine learning by allowing multiple clients to train collaboratively locally.
no code implementations • 18 Dec 2024 • Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang
To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference.
no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.
1 code implementation • 17 Dec 2024 • Shibing Mo, Kai Wu, Qixuan Gao, Xiangyi Teng, Jing Liu
This challenge has led to the manual design of GNNs tailored to specific graph types, but these approaches are limited by the high cost of labor and the constraints of expert knowledge, which cannot keep up with the rapid growth of graph data.
1 code implementation • 17 Dec 2024 • He Yu, Jing Liu
Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns.
no code implementations • 16 Dec 2024 • Gangqiang Hu, Jianfeng Lu, Jianmin Han, Shuqin Cao, Jing Liu, Hao Fu
However, in the context of semi-decentralized FL, clients' communication and training states are dynamic.
1 code implementation • 13 Dec 2024 • Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Xinwen Zhang, Xingyu Zheng, Jixuan Xu, Yue Zhang, Jinlong Hou, Huyang Sun
In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark.
no code implementations • 12 Dec 2024 • Jing Liu, Abdellah Fourtassi
LLMs can generate human-like dialogues, yet their ability to simulate early child-adult interactions remains largely unexplored.
no code implementations • 5 Dec 2024 • Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba
We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks.
no code implementations • CVPR 2025 • Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Yuqian Zhang, Linjie Luo, Bo Yuan
While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates.
no code implementations • 22 Nov 2024 • Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.
no code implementations • 21 Nov 2024 • Jing Liu, Yang Liu, Xiaoguang Zhu
Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems, making Privacy-Preserving Video Anomaly Detection (P2VAD) a hotspot in the AI community.
no code implementations • 21 Nov 2024 • Duanchu Wang, Jing Liu, Haoran Gong, Yinghui Quan, Di Wang
Transformer-based methods have become the dominant approach for 3D instance segmentation.
no code implementations • CVPR 2025 • Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo
Existing methods suffer from limitations such as the reliance on segmentation models, increased runtime, or a high probability of ID leakage.
no code implementations • 7 Nov 2024 • Ruiyang Ren, Yuhao Wang, Kun Zhou, Wayne Xin Zhao, Wenjie Wang, Jing Liu, Ji-Rong Wen, Tat-Seng Chua
Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach.
no code implementations • 6 Nov 2024 • Ruhan Wang, Ye Wang, Jing Liu, Toshiaki Koike-Akino
Modern quantum machine learning (QML) methods involve the variational optimization of parameterized quantum circuits on training datasets, followed by predictions on testing datasets.
no code implementations • 28 Oct 2024 • He Yu, Jing Liu
Since this synergy enables a more efficient and creative search process, we first conduct an extensive review of recent research on the application of LLMs in optimization.
1 code implementation • 24 Oct 2024 • Zijia Zhao, Longteng Guo, Tongtian Yue, Erdong Hu, Shuai Shao, Zehuan Yuan, Hua Huang, Jing Liu
In this paper, we investigate the task of general conversational image retrieval on open-domain images.
no code implementations • 23 Oct 2024 • He Yu, Jing Liu
Achieving robust networks is a challenging problem due to its NP-hard nature and complex solution space.
no code implementations • 14 Oct 2024 • Tongtian Yue, Longteng Guo, Jie Cheng, Xuange Gao, Jing Liu
In this paper, we propose a novel Ada-K routing strategy that dynamically adjusts the number of activated experts for each token, thereby improving the balance between computational efficiency and model performance.
no code implementations • 14 Oct 2024 • Tongtian Yue, Shuning Xue, Xuange Gao, Yepeng Tang, Longteng Guo, Jie Jiang, Jing Liu
First, we propose an electrode-wise modeling strategy that treats each electrode as a fundamental unit, enabling the integration of diverse EEG datasets collected from up to 138 electrodes, amassing 37. 5M pre-training samples.
no code implementations • 11 Oct 2024 • Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos.
no code implementations • 5 Oct 2024 • Yong Guo, Shulian Zhang, Haolin Pan, Jing Liu, Yulun Zhang, Jian Chen
To address this, we propose a Gap Preserving Distillation (GPD) method that trains an additional dynamic teacher model from scratch along with training the student to bridge this gap.
no code implementations • 2 Oct 2024 • Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu
To prevent redundant modeling of common video signals, we propose a novel diffusion-based framework, named COMUNI, which decomposes the COMmon and UNIque video signals to enable efficient video generation.
1 code implementation • 2 Oct 2024 • Mingzhen Sun, Weining Wang, Yanyuan Qiao, Jiahui Sun, Zihan Qin, Longteng Guo, Xinxin Zhu, Jing Liu
Sounding Video Generation (SVG) is an audio-video joint generation task challenged by high-dimensional signal spaces, distinct data formats, and different patterns of content information.
1 code implementation • 2 Oct 2024 • Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang
Diffusion Transformers (DiTs) excel in generative tasks but face practical deployment challenges due to high inference costs.
no code implementations • 30 Sep 2024 • Jing Liu, Tianyi Zeng, Abdelkhalick Mohammad, Xin Dong, Dragos Axinte
This paper introduces a simple-structured, model-less fuzzy logic controller for the closed-loop control of continuum robots.
no code implementations • 27 Sep 2024 • Junyou Zhu, Yanyuan Qiao, Siqi Zhang, Xingjian He, Qi Wu, Jing Liu
In recent years, Embodied Artificial Intelligence (Embodied AI) has advanced rapidly, yet the increasing size of models conflicts with the limited computational capabilities of Embodied AI platforms.
no code implementations • 25 Sep 2024 • Yihong Tang, Bo wang, Xu Wang, Dongming Zhao, Jing Liu, Jijun Zhang, Ruifang He, Yuexian Hou
Role-playing systems powered by large language models (LLMs) have become increasingly influential in emotional communication applications.
1 code implementation • 23 Sep 2024 • Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin
The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images.
no code implementations • 20 Sep 2024 • Liangyu Teng, Yang Liu, Jing Liu, Liang Song
Specifically, the large cloud model acts as a teacher, guiding and promoting the learning of the end model, which significantly reduces the end model's reliance on large-scale, high-quality data and thereby addresses the data bottleneck in traditional end model training, offering a new paradigm for the rapid deployment of industry applications.
no code implementations • 13 Sep 2024 • Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 11 Sep 2024 • Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Bradley Malin, Kieran Parsons, Ye Wang
We explore user-level gradient inversion as a new attack surface in distributed learning.
no code implementations • 9 Sep 2024 • Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, YaoWei Wang, Ming-Hsuan Yang, Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li, Hao Fang, Feiyu Pan, Xiankai Lu, Wei zhang, Runmin Cong, Tuyen Tran, Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu
Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes.
no code implementations • 30 Aug 2024 • Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu
To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation of multiple modalities in tracking while dynamically fusing temporal information.
Ranked #6 on
Rgb-T Tracking
on RGBT210
no code implementations • 30 Aug 2024 • Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang
This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process.
no code implementations • 29 Aug 2024 • Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Bradley Malin, Ye Wang
While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients.
1 code implementation • 28 Aug 2024 • Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan
Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies.
no code implementations • 23 Aug 2024 • Niklas Risse, Jing Liu, Marcel Böhme
We call a function "vulnerable" if it was involved in a patch of an actual security flaw and confirmed to cause the program's vulnerability.
no code implementations • 20 Aug 2024 • Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu
Referring Video Object Segmentation is an emerging multi-modal task that aims to segment objects in the video given a natural language expression.
no code implementations • 19 Aug 2024 • Hang Zou, Chenxi Du, Ajian Liu, Yuan Zhang, Jing Liu, MingChuan Yang, Jun Wan, HUI ZHANG
Specifically, we utilize the Mixture of Experts (MoE) to fit complex data distributions using multiple sub-neural networks.
no code implementations • 29 Jul 2024 • Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo
Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity.
1 code implementation • 29 Jul 2024 • Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang
We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e. g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks.
1 code implementation • 28 Jul 2024 • Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, DaCheng Tao
However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising.
no code implementations • 16 Jul 2024 • Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons
Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models.
no code implementations • 15 Jul 2024 • Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino
Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors.
no code implementations • 15 Jul 2024 • Keshav Bimbraw, Jing Liu, Ye Wang, Toshiaki Koike-Akino
Notably, the proposed method is also robust to an increase in the number of missing channels compared to other methods.
no code implementations • 12 Jul 2024 • Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu
The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources.
no code implementations • 8 Jul 2024 • Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu
This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module.
1 code implementation • 29 Jun 2024 • Chi Zhao, Jing Liu, Elena Parilina
In this paper, we proposed a new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) for measuring feature importance.
Explainable artificial intelligence
Explainable Artificial Intelligence (XAI)
+1
2 code implementations • 24 Jun 2024 • Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, YaoWei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng, Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu, Feiyu Pan, Hao Fang, Xiankai Lu
Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments.
no code implementations • 20 Jun 2024 • Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu
Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions.
Instance Segmentation
Referring Video Object Segmentation
+5
1 code implementation • 13 Jun 2024 • Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, WeiPeng Chen, Jing Liu
In this paper, we propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation.
no code implementations • 13 Jun 2024 • Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang
LLM development involves pre-training a foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts.
1 code implementation • 13 Jun 2024 • Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue
We propose to build omni-modal intelligence, which is capable of understanding any modality and learning universal representations.
Ranked #194 on
Visual Question Answering
on MM-Vet
no code implementations • 7 Jun 2024 • Jing Liu, Andrew Lowy, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples.
no code implementations • 29 May 2024 • Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, YaTing Li, Wenqiang Zhang
By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.
Ranked #5 on
Dynamic Facial Expression Recognition
on FERV39k
Dynamic Facial Expression Recognition
Facial Expression Recognition
no code implementations • 23 May 2024 • Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang
In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference.
no code implementations • 23 May 2024 • Xinyu Guo, Kai Wu, XiaoYu Zhang, Jing Liu
Class-imbalanced node classification tasks are prevalent in real-world scenarios.
1 code implementation • 23 May 2024 • Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang
In terms of efficiency, ZipCache also showcases a $37. 3\%$ reduction in prefill-phase latency, a $56. 9\%$ reduction in decoding-phase latency, and a $19. 8\%$ reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of $4096$.
no code implementations • 18 May 2024 • Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, Jing Liu
Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation.
no code implementations • 16 May 2024 • Xinyu Zhang, Yijin Xiong, Qianxin Qu, RenJie Wang, Xin Gao, Jing Liu, Shichun Guo, Jun Li
Camera and LiDAR, the bedrock sensors in autonomous driving, exhibit expansive applicability.
1 code implementation • 16 May 2024 • Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Liang Cao, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung
The increasing utilization of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community.
no code implementations • 15 May 2024 • Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 6 May 2024 • XiaoBin Li, Kai Wu, Yujian Betterest Li, XiaoYu Zhang, Handing Wang, Jing Liu
Zero-shot optimization involves optimizing a target task that was not seen during training, aiming to provide the optimal solution without or with minimal adjustments to the optimizer.
1 code implementation • 22 Apr 2024 • Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu
Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions.
no code implementations • 12 Apr 2024 • Yichen Yan, Xingjian He, Sihan Chen, Jing Liu
In this paper, we introduce CRFormer, a model that iteratively calibrates multi-modal features in the transformer decoder.
no code implementations • CVPR 2024 • Zhiwei Yang, Jing Liu, Peng Wu
Further, we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames.
no code implementations • 3 Apr 2024 • Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai
Assessing instruction quality is a fundamental component of any improvement efforts in the education system.
no code implementations • 27 Mar 2024 • Jiahao Luo, Jing Liu, James Davis
Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions.
1 code implementation • CVPR 2024 • Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu
In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process.
no code implementations • 20 Mar 2024 • Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu
The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.
Ranked #179 on
Visual Question Answering
on MM-Vet
no code implementations • 18 Mar 2024 • Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision.
no code implementations • 15 Mar 2024 • Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo wang, Quan Chen, Han Li, Jing Liu
We condense the retrieved knowledge passages from two perspectives.
1 code implementation • 7 Mar 2024 • Hui Huang, Yingqi Qu, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao, Wenpeng Lu
The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods.
no code implementations • 6 Mar 2024 • Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu
This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts.
1 code implementation • 5 Mar 2024 • Hui Huang, Yingqi Qu, Xingyuan Bu, Hongli Zhou, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao
Alternatively, other works have fine-tuned judge models based on open-source LLMs as the evaluator.
1 code implementation • 28 Feb 2024 • Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao
To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.
1 code implementation • 27 Feb 2024 • Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, Jing Liu, Ji-Rong Wen
By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents.
no code implementations • 27 Feb 2024 • Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang
Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation.
no code implementations • 20 Feb 2024 • Jie Yan, Jing Liu, Yi-Zi Ning, Zhong-Yuan Zhang
In federated clustering, multiple data-holding clients collaboratively group data without exchanging raw data.
1 code implementation • 17 Feb 2024 • Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu
To promote classic VG towards human intention interpretation, we propose a new intention-driven visual grounding (IVG) task and build a large-scale IVG dataset termed IntentionVG with free-form intention expressions.
no code implementations • 14 Feb 2024 • Andrew Lowy, Zhuohang Li, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set.
1 code implementation • 19 Jan 2024 • Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin
To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor.
no code implementations • 16 Jan 2024 • Yang Feng, Zhaohui Sun, Chengcheng Wang, Xinyi Guo, Junyao Mei, Yueran Qi, Jing Liu, Junyu Zhang, Jixuan Wu, Xuepeng Zhan, Jiezhi Chen
Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability.
1 code implementation • 12 Jan 2024 • Jie Yan, Jing Liu, Zhong-Yuan Zhang
Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases.
no code implementations • 2 Jan 2024 • Hongyu Wang, Xiaotao Liu, YiFan Li, Meng Sun, Dian Yuan, Jing Liu
RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving.
Ranked #12 on
Rgb-T Tracking
on RGBT210
1 code implementation • CVPR 2024 • Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu
To foster future research into fine-grained visual grounding our benchmark RefCOCOm the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES.
1 code implementation • 18 Dec 2023 • Lanlan Chen, Kai Wu, Jian Lou, Jing Liu
Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling.
1 code implementation • 13 Dec 2023 • Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu
To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES
no code implementations • 13 Dec 2023 • Jie Yan, Jing Liu, Zhong-Yuan Zhang
In the E-step, we aim to derive a mixture of Gaussian priors for the subsequent M-step.
1 code implementation • CVPR 2024 • Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang
In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.
1 code implementation • 29 Nov 2023 • Zijian Chen, Wei Sun, Jun Jia, Fangfang Lu, ZiCheng Zhang, Jing Liu, Ru Huang, Xiongkuo Min, Guangtao Zhai
The quality score of a banding image is generated by pooling the banding detection maps masked by the spatial frequency filters.
1 code implementation • CVPR 2024 • Yushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu
Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization.
no code implementations • CVPR 2024 • Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang
Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.
no code implementations • 3 Nov 2023 • James Boyko, Joseph Cohen, Nathan Fox, Maria Han Veiga, Jennifer I-Hsiu Li, Jing Liu, Bernardo Modenesi, Andreas H. Rauch, Kenneth N. Reid, Soumi Tribedi, Anastasia Visheratina, Xin Xie
In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision.
no code implementations • 28 Oct 2023 • Haoran Shen, Yifu Zhang, Wenxuan Wang, Chen Chen, Jing Liu, Shanshan Song, Jiangyun Li
As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i. e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices.
no code implementations • 12 Oct 2023 • Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus
Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects.
2 code implementations • 12 Oct 2023 • Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.
1 code implementation • 5 Oct 2023 • Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
1 code implementation • NeurIPS 2023 • Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu
Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner.
no code implementations • 12 Sep 2023 • Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson
Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 11 Sep 2023 • Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng
Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts.
no code implementations • 8 Sep 2023 • Yanrui Du, Sendong Zhao, Yuhan Chen, Rai Bai, Jing Liu, Hua Wu, Haifeng Wang, Bing Qin
To address this issue, it is crucial to analyze and mitigate the influence of superficial clues on STM models.
1 code implementation • 5 Sep 2023 • Kai Wu, Yuanyuan Li, Jing Liu
Inferring networks from observed time series data presents a clear glimpse into the interconnections among nodes.
1 code implementation • 23 Aug 2023 • Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani
The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model.
1 code implementation • ICCV 2023 • Yanyuan Qiao, Yuankai Qi, Zheng Yu, Jing Liu, Qi Wu
Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.
no code implementations • 18 Aug 2023 • Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu
Our method harnesses the potential of the multi-modal features in the segmentation stage and aligns language features of different emphases with image features to achieve fine-grained text-to-pixel correlation.
1 code implementation • ICCV 2023 • Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
Driver distraction has become a significant cause of severe traffic accidents over the past decade.
1 code implementation • ICCV 2023 • Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song
In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner.
1 code implementation • 24 Jul 2023 • Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang
In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.
1 code implementation • 20 Jul 2023 • Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang
In this study, we present the first analysis on the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain question answering (QA), with a bunch of important findings.
1 code implementation • 20 Jul 2023 • Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai
Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc.
1 code implementation • 1 Jul 2023 • Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai
In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023.
1 code implementation • 30 Jun 2023 • Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.
1 code implementation • 15 Jun 2023 • Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu
Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.
Ranked #1 on
TGIF-Frame
on TGIF-QA
(using extra training data)
1 code implementation • 15 Jun 2023 • Kun Zhang, Le Wu, Guangyi Lv, Enhong Chen, Shulan Ruan, Jing Liu, Zhiqiang Zhang, Jun Zhou, Meng Wang
Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
2 code implementations • NeurIPS 2023 • Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu
Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).
Ranked #1 on
Image Captioning
on COCO Captions
(SPICE metric, using extra
training data)
no code implementations • 27 May 2023 • Kai Wu, Yujian Betterest Li, Jian Lou, XiaoYu Zhang, Handing Wang, Jing Liu
To address this challenge, this paper focuses on the Rapid Plug-in Defender (RaPiD) problem, aiming to rapidly counter adversarial perturbations without altering the deployed model.
1 code implementation • 25 May 2023 • Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, Jing Liu
We show that only language-paired two-modality data is sufficient to connect all modalities.
no code implementations • 24 May 2023 • Yichen Yan, Xingjian He, Wenxuan Wan, Jing Liu
However, this task is challenging due to the distinct data properties between text and image, and the randomness introduced by diverse objects and unrestricted language expression.
no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng
Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
Ranked #1 on
Visual Question Answering (VQA)
on MSVD-QA
(using extra training data)
no code implementations • 19 May 2023 • Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li
Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression.
1 code implementation • 19 May 2023 • Zikang Liu, Sihan Chen, Longteng Guo, Handong Li, Xingjian He, Jing Liu
In this paper, we propose a novel method called Joint QA and DC GEneration (JADE), which utilizes a pre-trained multimodal model and easily-crawled image-text pairs to automatically generate and filter large-scale VQA and dense captioning datasets.
no code implementations • 18 May 2023 • Ruiyang Ren, Wayne Xin Zhao, Jing Liu, Hua Wu, Ji-Rong Wen, Haifeng Wang
Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters.
1 code implementation • NeurIPS 2023 • Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.
no code implementations • 12 May 2023 • Kai Cheng, Xinhua Zeng, Yang Liu, Tian Wang, Chengxin Pang, Jing Teng, Zhaoyang Xia, Jing Liu
Since the anomaly set is complicated and unbounded, our STHA can adjust its detection ability to adapt to the human detection demands and the complexity degree of anomaly that happened in the history of a scene.
no code implementations • 9 May 2023 • Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris
Prior approaches typically relied on subword encoders for encoding the bias phrases.
no code implementations • 30 Apr 2023 • Minghui Yang, Jing Liu, Zhiwei Yang, Zhaoyang Wu
Focusing on more effective and comprehensive anomaly detection, we propose a network based on self-supervised learning and self-attentive graph convolution (SLSG) for anomaly detection.
Ranked #8 on
Anomaly Detection
on MVTec LOCO AD
no code implementations • 24 Apr 2023 • XiaoBin Li, Kai Wu, XiaoYu Zhang, Handing Wang, Jing Liu
To achieve this, 1) drawing on the mechanism of genetic algorithm, we propose a deep neural network framework called B2Opt, which has a stronger representation of optimization strategies based on survival of the fittest; 2) B2Opt can utilize the cheap surrogate functions of the target task to guide the design of the efficient optimization strategies.
no code implementations • 21 Apr 2023 • Jiachen Shen, Wenxuan Wang, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, Jiangyun Li
Thus, it is of increasing importance to fine-tune pre-trained models for medical volumetric segmentation tasks in a both effective and parameter-efficient manner.
no code implementations • 19 Apr 2023 • Kai Wu, XiaoBin Li, Penghui Liu, Jing Liu
We design a deep evolutionary convolution network (DECN) to realize the move from hand-designed EAs to automated EAs without manual interventions.
1 code implementation • 17 Apr 2023 • Jing Liu, Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang
Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.
Ranked #1 on
Video Captioning
on VATEX
(using extra training data)
no code implementations • CVPR 2023 • Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, Xiaotao Liu
Video anomaly detection (VAD) is a significant computer vision problem.
no code implementations • 5 Apr 2023 • Jing Liu, Donglai Wei, Yang Liu, Sipeng Zhang, Tong Yang, Victor C. M. Leung
This dual-pronged strategy enhances feature alignment and cross-modal correspondences, enabling accurate distinction of similar individuals while maintaining a streamlined dual-encoder architecture for real-time inference, which is essential for resource-limited sensors and IoT systems.
no code implementations • 3 Apr 2023 • Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks.
no code implementations • 30 Mar 2023 • Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko
End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 29 Mar 2023 • Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu
In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.
no code implementations • CVPR 2023 • Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo
We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.
no code implementations • 24 Mar 2023 • Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo
Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.
1 code implementation • CVPR 2023 • Zhaodi Zhang, Zhiyi Xue, Yang Chen, Si Liu, Yueling Zhang, Jing Liu, Min Zhang
Via abstraction, all perturbed images are mapped into intervals before feeding into neural networks for training.
1 code implementation • 14 Mar 2023 • ZiCheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, Guangtao Zhai
Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc.
2 code implementations • CVPR 2023 • Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu
Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.
no code implementations • 28 Feb 2023 • Yanchen Liu, Jing Yan, Yan Chen, Jing Liu, Hua Wu
Recent studies reveal that various biases exist in different NLP tasks, and over-reliance on biases results in models' poor generalization ability and low adversarial robustness.
1 code implementation • 27 Feb 2023 • Jing Liu, Tongya Zheng, Guanzheng Zhang, Qinfen Hao
It then provides a comprehensive summary of three types of Graph-based Knowledge Distillation methods, namely Graph-based Knowledge Distillation for deep neural networks (DKD), Graph-based Knowledge Distillation for GNNs (GKD), and Self-Knowledge Distillation based Graph-based Knowledge Distillation (SKD).
no code implementations • 23 Feb 2023 • Kun Yang, Jing Liu, Dingkang Yang, Hanqi Wang, Peng Sun, Yanni Zhang, Yan Liu, Liang Song
With the rapid development of intelligent transportation system applications, a tremendous amount of multi-view video data has emerged to enhance vehicle perception.
no code implementations • 14 Feb 2023 • Minghao Liu, Zeyu Cheng, Shen Sang, Jing Liu, James Davis
Compared to direct annotation of labels, the proposed method: produces higher annotator agreements, causes machine learning to generates more consistent predictions, and only requires a marginal cost to add new rendering systems.
no code implementations • 10 Feb 2023 • XiuLin Wang, Jing Liu, FengYu Cong
Tensor decomposition is a fundamental technique widely applied in signal processing, machine learning, and various other fields.
1 code implementation • 10 Feb 2023 • Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, Liang Song
Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos.
no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.
no code implementations • 25 Jan 2023 • Jing Liu, Hemant Singh, Saber Elsayed, Robert Hunjet, Hussein Abbass
Robotic shepherding is a bio-inspired approach to autonomously guiding a swarm of agents towards a desired location.
no code implementations • 19 Jan 2023 • Yuanyuan Li, Kai Wu, Jing Liu
Our proposal is competitive in identifying the change points and discovering governing differential equations in three hybrid systems and two switching linear systems.
no code implementations • ICCV 2023 • Dan Liu, Jin Hou, Shaoli Huang, Jing Liu, Yuxin He, Bochuan Zheng, Jifeng Ning, Jingdong Zhang
To break the deadlock, we present LoTE-Animal, a large-scale endangered animal dataset collected over 12 years, to foster the application of deep learning in rare species conservation.
Ranked #1 on
Action Recognition
on LoTE-Animal
no code implementations • ICCV 2023 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.