Search Results for author: Jing Liu

Found 212 papers, 83 papers with code

基于相似度进行句子选择的机器阅读理解数据增强(Machine reading comprehension data Augmentation for sentence selection based on similarity)

no code implementations CCL 2022 Shuang Nie, Zheng Ye, Jun Qin, Jing Liu

“目前常见的机器阅读理解数据增强方法如回译, 单独对文章或者问题进行数据增强, 没有考虑文章、问题和选项三元组之间的联系。因此, 本文探索了一种利用三元组联系进行文章句子筛选的数据增强方法, 通过比较文章与问题以及选项的相似度, 选取文章中与二者联系紧密的句子。同时为了使不同选项的三元组区别增大, 我们选用了正则化Dropout的策略。实验结果表明, 在RACE数据集上的准确率可提高3. 8%。”

Data Augmentation Machine Reading Comprehension +1

\textrm{DuReader}_{\textrm{vis}}: A Chinese Dataset for Open-domain Document Visual Question Answering

1 code implementation Findings (ACL) 2022 Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e. g., web pages, PDFs, or Word documents) as the information source.

document understanding Open-Domain Question Answering +1

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Deep Transferring Quantization

1 code implementation ECCV 2020 Zheng Xie, Zhiquan Wen, Jing Liu, Zhi-Qiang Liu, Xixian Wu, Mingkui Tan

Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model.

Face Recognition Image Classification +2

The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

no code implementations3 Apr 2024 Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai

Assessing instruction quality is a fundamental component of any improvement efforts in the education system.

SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface

no code implementations27 Mar 2024 Jiahao Luo, Jing Liu, James Davis

Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions.

3D Reconstruction Face Reconstruction +1

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

1 code implementation20 Mar 2024 Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu

In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process.

VL-Mamba: Exploring State Space Models for Multimodal Learning

no code implementations20 Mar 2024 Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu

The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.

Language Modelling Large Language Model +1

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

no code implementations18 Mar 2024 Xiangyu Chen, Jing Liu, Ye Wang, Pu, Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision.

Transfer Learning

Self-Evaluation of Large Language Model based on Glass-box Features

no code implementations7 Mar 2024 Hui Huang, Yingqi Qu, Jing Liu, Muyun Yang, Tiejun Zhao

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods.

Language Modelling Large Language Model

Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

no code implementations6 Mar 2024 Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu

This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts.

SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

no code implementations28 Feb 2024 Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao

To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.

Image Generation Language Modelling

BASES: Large-scale Web Search User Simulation with Large Language Model based Agents

no code implementations27 Feb 2024 Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang

Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation.

Information Retrieval Language Modelling +3

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

1 code implementation27 Feb 2024 Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, Jing Liu, Ji-Rong Wen

By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents.

Open-Domain Question Answering Retrieval

CCFC++: Enhancing Federated Clustering through Feature Decorrelation

no code implementations20 Feb 2024 Jie Yan, Jing Liu, Yi-Zi Ning, Zhong-Yuan Zhang

In federated clustering, multiple data-holding clients collaboratively group data without exchanging raw data.

Clustering Contrastive Learning

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

1 code implementation17 Feb 2024 Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu

Previous datasets and methods for classic VG task mainly rely on the prior assumption that the given expression must literally refer to the target object, which greatly impedes the practical deployment of agents in real-world scenarios.

Visual Grounding

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

no code implementations14 Feb 2024 Andrew Lowy, Zhuohang Li, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set.

Inference Attack Membership Inference Attack

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

no code implementations19 Jan 2024 Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor.

regression

CCFC: Bridging Federated Clustering and Contrastive Learning

1 code implementation12 Jan 2024 Jie Yan, Jing Liu, Zhong-Yuan Zhang

Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases.

Clustering Contrastive Learning +1

Temporal Adaptive RGBT Tracking with Modality Prompt

no code implementations2 Jan 2024 Hongyu Wang, Xiaotao Liu, YiFan Li, Meng Sun, Dian Yuan, Jing Liu

RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving.

Autonomous Driving Rgb-T Tracking

Signed Graph Neural Ordinary Differential Equation for Modeling Continuous-time Dynamics

1 code implementation18 Dec 2023 Lanlan Chen, Kai Wu, Jian Lou, Jing Liu

Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling.

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

1 code implementation13 Dec 2023 Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

Descriptive Object +3

Efficient Stitchable Task Adaptation

1 code implementation29 Nov 2023 Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

1 code implementation27 Nov 2023 Yushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu

Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization.

Denoising Image Generation +1

Open-Vocabulary Video Anomaly Detection

no code implementations13 Nov 2023 Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang

Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task.

Anomaly Detection Video Anomaly Detection

An Interdisciplinary Outlook on Large Language Models for Scientific Research

no code implementations3 Nov 2023 James Boyko, Joseph Cohen, Nathan Fox, Maria Han Veiga, Jennifer I-Hsiu Li, Jing Liu, Bernardo Modenesi, Andreas H. Rauch, Kenneth N. Reid, Soumi Tribedi, Anastasia Visheratina, Xin Xie

In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision.

Med-DANet V2: A Flexible Dynamic Architecture for Efficient Medical Volumetric Segmentation

no code implementations28 Oct 2023 Haoran Shen, Yifu Zhang, Wenxuan Wang, Chen Chen, Jing Liu, Shanshan Song, Jiangyun Li

As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i. e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices.

Computational Efficiency MRI segmentation +2

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

no code implementations12 Oct 2023 Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus

Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects.

EEG Subject Transfer

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations12 Oct 2023 Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

no code implementations5 Oct 2023 Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width.

Denoising Image Generation +1

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

1 code implementation NeurIPS 2023 Mingzhen Sun, Weining Wang, Zihan Qin, Jiahui Sun, Sihan Chen, Jing Liu

Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner.

Video Generation

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

no code implementations12 Sep 2023 Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

no code implementations11 Sep 2023 Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng

Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts.

Text-to-Image Generation

Model-agnostic network inference enhancement from noisy measurements via curriculum learning

1 code implementation5 Sep 2023 Kai Wu, Yuanyuan Li, Jing Liu

Noise is a pervasive element within real-world measurement data, significantly undermining the performance of network inference models.

FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

1 code implementation23 Aug 2023 Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani

The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model.

Action Unit Detection Cross-corpus +1

March in Chat: Interactive Prompting for Remote Embodied Referring Expression

1 code implementation ICCV 2023 Yanyuan Qiao, Yuankai Qi, Zheng Yu, Jing Liu, Qi Wu

Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.

Referring Expression Vision and Language Navigation

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

no code implementations18 Aug 2023 Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu

In previous approaches, fused vision-language features are directly fed into a decoder and pass through a convolution with a fixed kernel to obtain the result, which follows a similar pattern as traditional image segmentation.

Image Segmentation Referring Expression Segmentation +2

Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception

no code implementations ICCV 2023 Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song

In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner.

3D Object Detection Autonomous Vehicles +1

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

1 code implementation24 Jul 2023 Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang

In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.

Anomaly Detection Retrieval +2

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

1 code implementation20 Jul 2023 Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang

In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA.

Open-Domain Question Answering Retrieval +1

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals

1 code implementation20 Jul 2023 Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai

Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc.

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

1 code implementation1 Jul 2023 Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai

In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023.

Image Quality Assessment Text-to-Image Generation

Stitched ViTs are Flexible Vision Backbones

1 code implementation30 Jun 2023 Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

Description-Enhanced Label Embedding Contrastive Learning for Text Classification

1 code implementation15 Jun 2023 Kun Zhang, Le Wu, Guangyi Lv, Enhong Chen, Shulan Ruan, Jing Liu, Zhiqiang Zhang, Jun Zhou, Meng Wang

Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.

Contrastive Learning Relation +3

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

1 code implementation15 Jun 2023 Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu

Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.

 Ranked #1 on TGIF-Frame on TGIF-QA (using extra training data)

Question Answering Retrieval +6

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

1 code implementation NeurIPS 2023 Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).

 Ranked #1 on Image Captioning on COCO Captions (SPICE metric, using extra training data)

Audio captioning Audio-Visual Captioning +14

Pre-trained transformer for adversarial purification

no code implementations27 May 2023 Kai Wu, Yujian Betterest Li, Jian Lou, XiaoYu Zhang, Handing Wang, Jing Liu

It is frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based.

MMNet: Multi-Mask Network for Referring Image Segmentation

no code implementations24 May 2023 Yichen Yan, Xingjian He, Wenxuan Wan, Jing Liu

However, this task is challenging due to the distinct data properties between text and image, and the randomness introduced by diverse objects and unrestricted language expression.

Image Segmentation Segmentation +1

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Retrieval +6

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

no code implementations19 May 2023 Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li

Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression.

Image Segmentation Segmentation +1

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

1 code implementation19 May 2023 Zikang Liu, Sihan Chen, Longteng Guo, Handong Li, Xingjian He, Jing Liu

In this paper, we propose a novel method called Joint QA and DC GEneration (JADE), which utilizes a pre-trained multimodal model and easily-crawled image-text pairs to automatically generate and filter large-scale VQA and dense captioning datasets.

Dense Captioning Image Captioning +4

TOME: A Two-stage Approach for Model-based Retrieval

no code implementations18 May 2023 Ruiyang Ren, Wayne Xin Zhao, Jing Liu, Hua Wu, Ji-Rong Wen, Haifeng Wang

Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters.

Natural Questions Retrieval +1

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

no code implementations12 May 2023 Kai Cheng, Xinhua Zeng, Yang Liu, Tian Wang, Chengxin Pang, Jing Teng, Zhaoyang Xia, Jing Liu

Since the anomaly set is complicated and unbounded, our STHA can adjust its detection ability to adapt to the human detection demands and the complexity degree of anomaly that happened in the history of a scene.

Anomaly Detection Human Detection +2

SLSG: Industrial Image Anomaly Detection by Learning Better Feature Embeddings and One-Class Classification

no code implementations30 Apr 2023 Minghui Yang, Jing Liu, Zhiwei Yang, Zhaoyang Wu

Focusing on more effective and comprehensive anomaly detection, we propose a network based on self-supervised learning and self-attentive graph convolution (SLSG) for anomaly detection.

Classification One-Class Classification +1

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

no code implementations24 Apr 2023 XiaoBin Li, Kai Wu, XiaoYu Zhang, Handing Wang, Jing Liu

To achieve this, 1) drawing on the mechanism of genetic algorithm, we propose a deep neural network framework called B2Opt, which has a stronger representation of optimization strategies based on survival of the fittest; 2) B2Opt can utilize the cheap surrogate functions of the target task to guide the design of the efficient optimization strategies.

Med-Tuning: Parameter-Efficient Transfer Learning with Fine-Grained Feature Enhancement for Medical Volumetric Segmentation

no code implementations21 Apr 2023 Wenxuan Wang, Jiachen Shen, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, Jiangyun Li

In this paper, we present the study on parameter-efficient transfer learning for medical volumetric segmentation and propose a new framework named Med-Tuning based on intra-stage feature enhancement and inter-stage feature interaction.

Segmentation Transfer Learning

DECN: Automated Evolutionary Algorithms via Evolution Inspired Deep Convolution Network

no code implementations19 Apr 2023 Kai Wu, Penghui Liu, Jing Liu

Evolutionary algorithms (EAs) have emerged as a powerful framework for optimization, especially for black-box optimization.

Evolutionary Algorithms Meta-Learning

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

1 code implementation17 Apr 2023 Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu

Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.

 Ranked #1 on Video Captioning on VATEX (using extra training data)

Audio captioning Audio-Video Question Answering (AVQA) +16

Calibrating Cross-modal Features for Text-Based Person Searching

no code implementations5 Apr 2023 Donglai Wei, Sipeng Zhang, Tong Yang, Yang Liu, Jing Liu

On the other hand, the Masking Caption Modeling (MCM) loss leverages a masked captions prediction task to establish detailed and generic relationships between textual and visual parts.

Person Search Text based Person Search

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

no code implementations30 Mar 2023 Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

1 code implementation29 Mar 2023 Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu

In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.

Audio Generation Contrastive Learning +1

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

no code implementations CVPR 2023 Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

no code implementations24 Mar 2023 Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo

Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.

Transfer Learning

Boosting Verified Training for Robust Image Classifications via Abstraction

1 code implementation CVPR 2023 Zhaodi Zhang, Zhiyi Xue, Yang Chen, Si Liu, Yueling Zhang, Jing Liu, Min Zhang

Via abstraction, all perturbed images are mapped into intervals before feeding into neural networks for training.

Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images

1 code implementation14 Mar 2023 ZiCheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, Guangtao Zhai

Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc.

Image Quality Assessment NR-IQA

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

2 code implementations CVPR 2023 Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu

Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.

Object Unconditional Video Generation +2

SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases

no code implementations28 Feb 2023 Yanchen Liu, Jing Yan, Yan Chen, Jing Liu, Hua Wu

Recent studies reveal that various biases exist in different NLP tasks, and over-reliance on biases results in models' poor generalization ability and low adversarial robustness.

Adversarial Robustness Natural Language Inference +1

Graph-based Knowledge Distillation: A survey and experimental evaluation

1 code implementation27 Feb 2023 Jing Liu, Tongya Zheng, Guanzheng Zhang, Qinfen Hao

It then provides a comprehensive summary of three types of Graph-based Knowledge Distillation methods, namely Graph-based Knowledge Distillation for deep neural networks (DKD), Graph-based Knowledge Distillation for GNNs (GKD), and Self-Knowledge Distillation based Graph-based Knowledge Distillation (SKD).

Self-Knowledge Distillation

A novel efficient Multi-view traffic-related object detection framework

no code implementations23 Feb 2023 Kun Yang, Jing Liu, Dingkang Yang, Hanqi Wang, Peng Sun, Yanni Zhang, Yan Liu, Liang Song

With the rapid development of intelligent transportation system applications, a tremendous amount of multi-view video data has emerged to enhance vehicle perception.

Model Selection object-detection +1

Tag-based annotation creates better avatars

no code implementations14 Feb 2023 Minghao Liu, Zeyu Cheng, Shen Sang, Jing Liu, James Davis

Compared to direct annotation of labels, the proposed method: produces higher annotator agreements, causes machine learning to generates more consistent predictions, and only requires a marginal cost to add new rendering systems.

TAG

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

1 code implementation10 Feb 2023 Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, Liang Song

Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos.

Anomaly Detection Event Detection +1

A Survey on Efficient Training of Transformers

no code implementations2 Feb 2023 Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

Discover governing differential equations from evolving systems

no code implementations19 Jan 2023 Yuanyuan Li, Kai Wu, Jing Liu

Our proposal is competitive in identifying the change points and discovering governing differential equations in three hybrid systems and two switching linear systems.

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations ICCV 2023 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding

no code implementations ICCV 2023 Dan Liu, Jin Hou, Shaoli Huang, Jing Liu, Yuxin He, Bochuan Zheng, Jifeng Ning, Jingdong Zhang

To break the deadlock, we present LoTE-Animal, a large-scale endangered animal dataset collected over 12 years, to foster the application of deep learning in rare species conservation.

Action Recognition Domain Adaptation +5

Enhanced-rate Iterative Beamformers for Active IRS-assisted Wireless Communications

no code implementations16 Dec 2022 Yeqing Lin, Feng Shu, Rongen Dong, Riqing Chen, Siling Feng, Weiping Shi, Jing Liu, Jiangzhou Wang

In this paper, in order to boost the achievable rate of user in such a wireless network, three enhanced-rate iterative beamforming methods are proposed by designing the amplifying factors and the corresponding phases at active IRS.

Three High-rate Beamforming Methods for Active IRS-aided Wireless Network

no code implementations5 Dec 2022 Feng Shu, Jing Liu, Yeqing Lin, Yang Liu, Zhilin Chen, Xuehui Wang, Rongen Dong, Jiangzhou Wang

To fully exploit the amplifying gain achieved by active IRS, two high-rate methods, maximum ratio reflecting (MRR) and selective ratio reflecting (SRR) are presented, which are motivated by maximum ratio combining and selective ratio combining.

Vocal Bursts Intensity Prediction

Privacy-Preserving Federated Deep Clustering based on GAN

no code implementations30 Nov 2022 Jie Yan, Jing Liu, Ji Qi, Zhong-Yuan Zhang

Federated clustering (FC) is an essential extension of centralized clustering designed for the federated setting, wherein the challenge lies in constructing a global similarity measure without the need to share private data.

Clustering Deep Clustering +4

Higher-order Knowledge Transfer for Dynamic Community Detection with Great Changes

no code implementations28 Nov 2022 Huixin Ma, Kai Wu, Handing Wang, Jing Liu

In this way, our proposal can better keep the advantages of previous community detection results and transfer them to the next task.

Community Detection Dynamic Community Detection +1

Dense Text Retrieval based on Pretrained Language Models: A Survey

2 code implementations27 Nov 2022 Wayne Xin Zhao, Jing Liu, Ruiyang Ren, Ji-Rong Wen

With powerful PLMs, we can effectively learn the representations of queries and texts in the latent representation space, and further construct the semantic matching function between the dense vectors for relevance modeling.

Retrieval Text Retrieval

AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

no code implementations15 Nov 2022 Shen Sang, Tiancheng Zhi, Guoxian Song, Minghao Liu, Chunpong Lai, Jing Liu, Xiang Wen, James Davis, Linjie Luo

We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters.

Self-Supervised Learning

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations14 Nov 2022 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

LGN-Net: Local-Global Normality Network for Video Anomaly Detection

1 code implementation14 Nov 2022 Mengyang Zhao, Xinhua Zeng, Yang Liu, Jing Liu, Di Li, Xing Hu, Chengxin Pang

Existing unsupervised VAD methods tend to learn normality from training sets consisting of only normal videos and regard instances deviating from such normality as anomalies.

Anomaly Detection Video Anomaly Detection

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

2 code implementations7 Nov 2022 Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li, Dan Zhu, Mengdi Sun, Ran Duan, Yan Gao, Lingshun Kong, Long Sun, Xiang Li, Xingdong Zhang, Jiawei Zhang, Yaqi Wu, Jinshan Pan, Gaocheng Yu, Jin Zhang, Feng Zhang, Zhe Ma, Hongbin Wang, Hojin Cho, Steve Kim, Huaen Li, Yanbo Ma, Ziwei Luo, Youwei Li, Lei Yu, Zhihong Wen, Qi Wu, Haoqiang Fan, Shuaicheng Liu, Lize Zhang, Zhikai Zong, Jeremy Kwon, Junxi Zhang, Mengyuan Li, Nianxiang Fu, Guanchen Ding, Han Zhu, Zhenzhong Chen, Gen Li, Yuanfan Zhang, Lei Sun, Dafeng Zhang, Neo Yang, Fitz Liu, Jerry Zhao, Mustafa Ayazoglu, Bahri Batuhan Bilecen, Shota Hirose, Kasidis Arunruangsirilert, Luo Ao, Ho Chun Leung, Andrew Wei, Jie Liu, Qiang Liu, Dahai Yu, Ao Li, Lei Luo, Ce Zhu, Seongmin Hong, Dongwon Park, Joonhee Lee, Byeong Hyun Lee, Seunggyu Lee, Se Young Chun, Ruiyuan He, Xuhao Jiang, Haihang Ruan, Xinjian Zhang, Jing Liu, Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He

While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints.

Image Super-Resolution

Federated clustering with GAN-based data synthesis

1 code implementation29 Oct 2022 Jie Yan, Jing Liu, Ji Qi, Zhong-Yuan Zhang

Federated clustering (FC) is an extension of centralized clustering in federated settings.

Clustering Federated Learning +1

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

no code implementations9 Oct 2022 Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao, Zehuan Yuan, Jing Liu

Our method performs joint masking on image-text input and integrates both implicit and explicit targets for the masked signals to recover.

Question Answering Representation Learning +5

EcoFormer: Energy-Saving Attention with Linear Complexity

1 code implementation19 Sep 2022 Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.

Binarization

FocusFormer: Focusing on What We Need via Architecture Sampler

no code implementations23 Aug 2022 Jing Liu, Jianfei Cai, Bohan Zhuang

During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.

Neural Architecture Search

Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural Networks

no code implementations21 Aug 2022 Zhaodi Zhang, Yiting Wu, Si Liu, Jing Liu, Min Zhang

Considerable efforts have been devoted to finding the so-called tighter approximations to obtain more precise verification results.

An Interpretability Evaluation Benchmark for Pre-trained Language Models

no code implementations28 Jul 2022 Yaozong Shen, Lijie Wang, Ying Chen, Xinyan Xiao, Jing Liu, Hua Wu

To fill in the gap, we propose a novel evaluation benchmark providing with both English and Chinese annotated data.

HIRE: Distilling High-order Relational Knowledge From Heterogeneous Graph Neural Networks

no code implementations25 Jul 2022 Jing Liu, Tongya Zheng, Qinfen Hao

To the best of our knowledge, we are the first to propose a HIgh-order RElational (HIRE) knowledge distillation framework on heterogeneous graphs, which can significantly boost the prediction performance regardless of model architectures of HGNNs.

Knowledge Distillation Vocal Bursts Intensity Prediction

Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection

1 code implementation22 Jul 2022 Zhiwei Yang, Peng Wu, Jing Liu, Xiaotao Liu

Existing methods for anomaly detection based on memory-augmented autoencoder (AE) have the following drawbacks: (1) Establishing a memory bank requires additional memory space.

Anomaly Detection

Reducing US Biofuels Requirements Mitigates Short-term Impacts of Global Population and Income Growth on Agricultural Environmental Outcomes

no code implementations28 Jun 2022 David R. Johnson, Nathan B. Geldner, Jing Liu, Uris Lantz Baldos, Thomas Hertel

Biobased energy, particularly corn starch-based ethanol and other liquid renewable fuels, are a major element of federal and state energy policies in the United States.

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

1 code implementation25 May 2022 Yanrui Du, Jing Yan, Yan Chen, Jing Liu, Sendong Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Bing Qin

In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data.

Natural Language Inference Sentiment Analysis

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

MemSeg: A semi-supervised method for image surface defect detection using differences and commonalities

4 code implementations2 May 2022 Minghui Yang, Peng Wu, Jing Liu, Hui Feng

By comparing the similarities and differences between input samples and memory samples in the memory pool to give effective guesses about abnormal regions; In the inference phase, MemSeg directly determines the abnormal regions of the input image in an end-to-end manner.

Anomaly Detection Defect Detection +1

A Thorough Examination on Zero-shot Dense Retrieval

no code implementations27 Apr 2022 Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qifei Wu, Yuchen Ding, Hua Wu, Haifeng Wang, Ji-Rong Wen

Recent years have witnessed the significant advance in dense retrieval (DR) based on powerful pre-trained language models (PLM).

Retrieval

A Multi-Transformation Evolutionary Framework for Influence Maximization in Social Networks

1 code implementation7 Apr 2022 Chao Wang, Jiaxuan Zhao, Lingling Li, Licheng Jiao, Jing Liu, Kai Wu

Influence maximization is a crucial issue for mining the deep information of social networks, which aims to select a seed set from the network to maximize the number of influenced nodes.

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations CVPR 2023 Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Semantic Segmentation

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

no code implementations1 Apr 2022 Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra

In addition, the NLU model in the two-stage system is not streamable, as it must wait for the audio segments to complete processing, which ultimately impacts the latency of the SLU system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Network Collaborator: Knowledge Transfer Between Network Reconstruction and Community Detection

1 code implementation4 Jan 2022 Kai Wu, Chao Wang, Junyuan Chen, Jing Liu

Community detection (CD) from dynamics and network reconstruction (NR) from dynamics are natural synergistic tasks that motivate the proposed evolutionary multitasking NR and CD framework, called network collaborator (NC).

Community Detection Transfer Learning

Evolutionary Multitasking AUC Optimization

1 code implementation4 Jan 2022 Chao Wang, Kai Wu, Jing Liu

Inspired by the characteristic of pairwise learning, the cheap AUC optimization task with a small-scale dataset sampled from the large-scale dataset is constructed to promote the AUC accuracy of the original, large-scale, and expensive AUC optimization task.

Binary Classification

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

1 code implementation16 Dec 2021 Hongyu Zhu, Yan Chen, Jing Yan, Jing Liu, Yu Hong, Ying Chen, Hua Wu, Haifeng Wang

For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models.

Natural Questions

Sharpness-aware Quantization for Deep Neural Networks

3 code implementations24 Nov 2021 Jing Liu, Jianfei Cai, Bohan Zhuang

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

Image Classification Model Compression +1

Mesa: A Memory-saving Training Framework for Transformers

3 code implementations22 Nov 2021 Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.

Quantization

RVFR: Robust Vertical Federated Learning via Feature Subspace Recovery

no code implementations29 Sep 2021 Jing Liu, Chulin Xie, Krishnaram Kenthapadi, Oluwasanmi O Koyejo, Bo Li

Vertical Federated Learning (VFL) is a distributed learning paradigm that allows multiple agents to jointly train a global model when each agent holds a different subset of features for the same sample(s).

Vertical Federated Learning

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

no code implementations6 Sep 2021 Xingjian He, Weining Wang, Zhiyong Xu, Hao Wang, Jie Jiang, Jing Liu

Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction.

Scene Parsing

Resisting Out-of-Distribution Data Problem in Perturbation of XAI

no code implementations27 Jul 2021 Luyu Qiu, Yi Yang, Caleb Chen Cao, Jing Liu, Yueyuan Zheng, Hilary Hei Ting Ngai, Janet Hsiao, Lei Chen

Besides, our solution also resolves a fundamental problem with the faithfulness indicator, a commonly used evaluation metric of XAI algorithms that appears to be sensitive to the OoD issue.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

HANet: Hierarchical Alignment Networks for Video-Text Retrieval

1 code implementation26 Jul 2021 Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, Jing Liu

Based on these, we naturally construct hierarchical representations in the individual-local-global manner, where the individual level focuses on the alignment between frame and word, local level focuses on the alignment between video clip and textual context, and global level focuses on the alignment between the whole video and text.

Retrieval Text Matching +3

AgileGAN: stylizing portraits by inversion-consistent transfer learning

1 code implementation ACM Transactions on Graphics 2021 Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chun-Pong Lai, Chuanxia Zheng, Tat-Jen Cham

While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images.

Attribute motion retargeting +1

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

2 code implementations1 Jul 2021 Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

Audio to Text Retrieval Cross-Modal Retrieval +3

Tensor networks for unsupervised machine learning

1 code implementation24 Jun 2021 Jing Liu, Sujie Li, Jiang Zhang, Pan Zhang

Despite the great potential, however, existing tensor network models for unsupervised machine learning only work as a proof of principle, as their performance is much worse than the standard models such as restricted Boltzmann machines and neural networks.

BIG-bench Machine Learning Tensor Networks

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

no code implementations11 Jun 2021 Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind.

Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions

1 code implementation ACL 2021 Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, Tatsunori Hashimoto

In conversation, uptake happens when a speaker builds on the contribution of their interlocutor by, for example, acknowledging, repeating or reformulating what they have said.

Math Question Answering

Large-Scale Data-Driven Airline Market Influence Maximization

no code implementations31 May 2021 Duanshun Li, Jing Liu, Jinsung Jeon, Seoyoung Hong, Thai Le, Dongwon Lee, Noseong Park

On top of the prediction models, we define a budget-constrained flight frequency optimization problem to maximize the market influence over 2, 262 routes.

Boosting the Performance of Video Compression Artifact Reduction with Reference Frame Proposals and Frequency Domain Information

no code implementations31 May 2021 Yi Xu, Minyi Zhao, Jing Liu, Xinjian Zhang, Longwen Gao, Shuigeng Zhou, Huyang Sun

Many deep learning based video compression artifact removal algorithms have been proposed to recover high-quality videos from low-quality compressed videos.

Video Compression

Less is More: Pay Less Attention in Vision Transformers

2 code implementations29 May 2021 Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +3

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations2 Apr 2021 Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, Ming Tang

In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level.

Human Parsing Image Classification +3

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations ICCV 2021 Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Efficient ViTs

Temporal Memory Attention for Video Semantic Segmentation

1 code implementation17 Feb 2021 Hao Wang, Weining Wang, Jing Liu

Video semantic segmentation requires to utilize the complex temporal relations between frames of the video sequence.

Segmentation Semantic Segmentation +1

CPTR: Full Transformer Network for Image Captioning

no code implementations26 Jan 2021 Wei Liu, Sihan Chen, Longteng Guo, Xinxin Zhu, Jing Liu

Besides, we provide detailed visualizations of the self-attention between patches in the encoder and the "words-to-patches" attention in the decoder thanks to the full Transformer architecture.

Image Captioning

Global-Local Propagation Network for RGB-D Semantic Segmentation

no code implementations26 Jan 2021 Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, Jing Liu

Depth information matters in RGB-D semantic segmentation task for providing additional geometric information to color images.

Scene Segmentation Segmentation

Fast Sequence Generation with Multi-Agent Reinforcement Learning

no code implementations24 Jan 2021 Longteng Guo, Jing Liu, Xinxin Zhu, Hanqing Lu

These models are autoregressive in that they generate each word by conditioning on previously generated words, which leads to heavy latency during inference.

Image Captioning Machine Translation +5

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations13 Jan 2021 Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning