Search Results for author: Zhongyuan Wang

Found 86 papers, 41 papers with code

Table Fact Verification with Structure-Aware Transformer

no code implementations EMNLP 2020 Hongzhi Zhang, Yingyao Wang, Sirui Wang, Xuezhi Cao, Fuzheng Zhang, Zhongyuan Wang

Verifying fact on semi-structured evidence like tables requires the ability to encode structural information and perform symbolic reasoning.

Fact Verification

Emu3: Next-Token Prediction is All You Need

2 code implementations27 Sep 2024 Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, BoWen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang

While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e. g., Stable Diffusion) and compositional approaches (e. g., CLIP combined with LLMs).

Visual Question Answering

Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios

no code implementations30 Aug 2024 Zhongyuan Wang, Richong Zhang, Zhijie Nie, Jaein Kim

To address these challenges, we propose a tool-assisted agent framework for SQL inspection and refinement, equipping the LLM-based agent with two specialized tools: a retriever and a detector, designed to diagnose and correct SQL queries with database mismatches.

Management Text-To-SQL

Can We Leave Deepfake Data Behind in Training Deepfake Detector?

no code implementations30 Aug 2024 Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li

The accumulation of forgery information should be oriented and progressively increasing during this transition process.

Face Swapping

DePatch: Towards Robust Adversarial Patch for Evading Person Detectors in the Real World

no code implementations13 Aug 2024 Jikang Cheng, Ying Zhang, Zhongyuan Wang, Zou Qin, Chen Li

Recent years have seen an increasing interest in physical adversarial attacks, which aim to craft deployable patterns for deceiving deep neural networks, especially for person detectors.

IDRetracor: Towards Visual Forensics Against Malicious Face Swapping

no code implementations13 Aug 2024 Jikang Cheng, Jiaxin Ai, Zhen Han, Chao Liang, Qin Zou, Zhongyuan Wang, Qian Wang

To achieve visual forensics and target face attribution, we propose a novel task named face retracing, which considers retracing the original target face from the given fake one via inverse mapping.

DeepFake Detection Face Swapping

ED$^4$: Explicit Data-level Debiasing for Deepfake Detection

no code implementations13 Aug 2024 Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, Chen Li

Learning intrinsic bias from limited data has been considered the main reason for the failure of deepfake detection with generalizability.

DeepFake Detection Disentanglement +1

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

no code implementations26 Jun 2024 Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos.

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

no code implementations24 May 2024 Chenxi Sun, Hongzhi Zhang, Zijia Lin, Jingyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel.

Code Generation Language Modelling +3

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

no code implementations24 May 2024 Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen

In this paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the structured representation of scene graphs to rectify inaccuracies in the original text embeddings.

Text-to-Image Generation

Learning Multi-dimensional Human Preference for Text-to-Image Generation

1 code implementation CVPR 2024 Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

Current metrics for text-to-image models typically rely on statistical metrics which inadequately represent the real preference of humans.

Text-to-Image Generation

Tele-FLM Technical Report

no code implementations25 Apr 2024 Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications.

Language Modelling Large Language Model

End-to-end training of Multimodal Model and ranking Model

1 code implementation9 Apr 2024 Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang song, Na Mou, Shen Jiang, Han Li

In this paper, we propose an industrial multimodal recommendation framework named EM3: End-to-end training of Multimodal Model and ranking Model, which sufficiently utilizes multimodal information and allows personalized ranking tasks to directly train the core modules in the multimodal model to obtain more task-oriented content features, without overburdening resource consumption.

Contrastive Learning Multimodal Recommendation

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

1 code implementation20 Dec 2023 Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Contrastive Learning Denoising +6

KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

1 code implementation8 Dec 2023 Haojie Pan, Zepeng Zhai, Hao Yuan, Yaojia LV, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness.

Stable Segment Anything Model

1 code implementation27 Nov 2023 Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery

no code implementations16 Nov 2023 Ming Chen, Yan Zhou, Weihua Jian, Pengfei Wan, Zhongyuan Wang

Though significant progress in human pose and shape recovery from monocular RGB images has been made in recent years, obtaining 3D human motion with high accuracy and temporal consistency from videos remains challenging.

3D Human Pose Estimation TAR

Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios

no code implementations14 Nov 2023 Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality.

Decoder Language Modelling

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

no code implementations9 Nov 2023 Cheng Yang, Rui Xu, Ye Guo, Peixiang Huang, Yiru Chen, Wenkui Ding, Zhongyuan Wang, Hong Zhou

Further, we design two pre-training tasks named object position regression (OPR) and spatial relation classification (SRC) to learn to reconstruct the spatial relation graph respectively.

Position regression Relation +3

Graph Ranking Contrastive Learning: A Extremely Simple yet Efficient Method

no code implementations23 Oct 2023 Yulan Hu, Sheng Ouyang, Jingyu Liu, Ge Chen, Zhirui Yang, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Yong liu

Thus, we propose GraphRank, a simple yet efficient graph contrastive learning method that addresses the problem of false negative samples by redefining the concept of negative samples to a certain extent, thereby avoiding the issue of false negative samples.

Contrastive Learning Graph Learning +1

KwaiYiiMath: Technical Report

no code implementations11 Oct 2023 Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, ShengNan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning.

Ranked #93 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +1

Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences

no code implementations19 Sep 2023 Hongyang Chen, Yuhong Yang, Zhongyuan Wang, Weiping tu, Haojun Ai, Song Lin

This study explores how sentence types affect the Lombard effect and intelligibility enhancement, focusing on comparisons between natural and grid sentences.

Sentence

Code-Style In-Context Learning for Knowledge-Based Question Answering

1 code implementation9 Sep 2023 Zhijie Nie, Richong Zhang, Zhongyuan Wang, Xudong Liu

Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications.

Code Generation In-Context Learning +2

Towards Practical Capture of High-Fidelity Relightable Avatars

no code implementations8 Sep 2023 Haotian Yang, Mingwu Zheng, Wanquan Feng, Haibin Huang, Yu-Kun Lai, Pengfei Wan, Zhongyuan Wang, Chongyang Ma

Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes.

Implicit Identity Driven Deepfake Face Swapping Detection

no code implementations CVPR 2023 Baojin Huang, Zhongyuan Wang, Jifan Yang, Jiaxin Ai, Qin Zou, Qian Wang, Dengpan Ye

Face swapping aims to replace the target face with the source face and generate the fake face that the human cannot distinguish between real and fake.

Face Swapping

LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection

1 code implementation CVPR 2023 Jinsheng Xiao, Yuanxu Wu, Yunhua Chen, Shurui Wang, Zhongyuan Wang, Jiayi Ma

We find that context information from the long-term frame and temporal information from the short-term frame are two useful cues for video small object detection.

Object object-detection +1

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis

no code implementations13 Dec 2022 Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang, Zhongyuan Wang

In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed.

Data Augmentation Speech Synthesis +1

A Scale-Arbitrary Image Super-Resolution Network Using Frequency-domain Information

no code implementations8 Dec 2022 Jing Fang, Yinbo Yu, Zhongyuan Wang, Xin Ding, Ruimin Hu

Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images.

Image Super-Resolution valid

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset

no code implementations19 Nov 2022 Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng, Fan Yang, Size Li, Ruiji Fu, Zhongyuan Wang

Furthermore, based on this dataset, we propose an end-to-end model that jointly optimizes the video understanding objective with knowledge graph embedding, which can not only better inject factual knowledge into video understanding but also generate effective multi-modal entity embedding for KG.

Common Sense Reasoning Knowledge Graph Embedding +4

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

no code implementations17 Nov 2022 Chunyu Qiang, Peng Yang, Hao Che, Jinba Xiao, Xiaorui Wang, Zhongyuan Wang

In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data.

Data Augmentation Machine Translation +4

Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia

1 code implementation28 Oct 2022 Haojie Pan, Zepeng Zhai, Yuzhou Zhang, Ruiji Fu, Ming Liu, Yangqiu Song, Zhongyuan Wang, Bing Qin

In this paper, we propose Kuaipedia, a large-scale multi-modal encyclopedia consisting of items, aspects, and short videos lined to them, which was extracted from billions of videos of Kuaishou (Kwai), a well-known short-video platform in China.

Entity Linking Entity Typing

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

1 code implementation13 Oct 2022 Xing Wu, Chaochen Gao, Zijia Lin, Zhongyuan Wang, Jizhong Han, Songlin Hu

Sparse sampling is also likely to miss important frames corresponding to some text portions, resulting in textual redundancy.

Contrastive Learning Retrieval +1

Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

no code implementations10 Oct 2022 Wanfeng Zheng, Qiang Li, Xiaoyan Guo, Pengfei Wan, Zhongyuan Wang

More specifically, our efforts consist of three parts: 1) a data-free training strategy to train latent mappers to bridge the latent space of CLIP and StyleGAN; 2) for more precise mapping, temporal relative consistency is proposed to address the knowledge distribution bias problem among different latent spaces; 3) to refine the mapped latent in s space, adaptive style mixing is also proposed.

Image Manipulation Language Modelling +1

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

2 code implementations8 Oct 2022 Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu

Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer.

Contrastive Learning Language Modelling +5

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

no code implementations28 Sep 2022 Xiaohan Zou, Changqiao Wu, Lele Cheng, Zhongyuan Wang

Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency.

cross-modal alignment Text Retrieval +1

ConTextual Masked Auto-Encoder for Dense Passage Retrieval

2 code implementations16 Aug 2022 Xing Wu, Guangyuan Ma, Meng Lin, Zijia Lin, Zhongyuan Wang, Songlin Hu

Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i. e., vectors) of the query and the passages.

Decoder Passage Retrieval +2

Magic ELF: Image Deraining Meets Association Learning and Transformer

1 code implementation21 Jul 2022 Kui Jiang, Zhongyuan Wang, Chen Chen, Zheng Wang, Laizhong Cui, Chia-Wen Lin

Convolutional neural network (CNN) and Transformer have achieved great success in multimedia applications.

Rain Removal

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation18 Jul 2022 Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Deepfake Face Traceability with Disentangling Reversing Network

no code implementations8 Jul 2022 Jiaxin Ai, Zhongyuan Wang, Baojin Huang, Zhen Han

Deepfake face not only violates the privacy of personal identity, but also confuses the public and causes huge social harm.

DeepFake Detection Face Swapping

Diagnosing Ensemble Few-Shot Classifiers

no code implementations9 Jun 2022 Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, Shixia Liu

The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance.

ITTR: Unpaired Image-to-Image Translation with Transformers

no code implementations30 Mar 2022 Wanfeng Zheng, Qiang Li, Guoxin Zhang, Pengfei Wan, Zhongyuan Wang

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data.

Image-to-Image Translation Translation

Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

1 code implementation CVPR 2022 Zhuo Wang, Zezheng Wang, Zitong Yu, Weihong Deng, Jiahong Li, Tingting Gao, Zhongyuan Wang

A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space.

Contrastive Learning Domain Generalization +1

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

1 code implementation ACL 2022 Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang, Songlin Hu

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary.

Data Augmentation Language Modelling +3

Contrastive Learning of Semantic and Visual Representations for Text Tracking

1 code implementation30 Dec 2021 Zhuang Li, Weijia Wu, Mike Zheng Shou, Jiahong Li, Size Li, Zhongyuan Wang, Hong Zhou

Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video.

Contrastive Learning

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

1 code implementation10 Dec 2021 Chaochen Gao, Xing Wu, Peng Wang, Jue Wang, Liangjun Zang, Zhongyuan Wang, Songlin Hu

To tackle that, we propose an effective knowledge distillation framework for contrastive sentence embeddings, termed DistilCSE.

Contrastive Learning Knowledge Distillation +5

whu-nercms at trecvid2021:instance search task

no code implementations30 Oct 2021 Yanrui Niu, Jingyao Yang, Ankang Lu, Baojin Huang, Yue Zhang, Ji Huang, Shishi Wen, Dongshu Xu, Chao Liang, Zhongyuan Wang, Jun Chen

We will make a brief introduction of the experimental methods and results of the WHU-NERCMS in the TRECVID2021 in the paper.

Action Detection Face Detection +5

TANet: A new Paradigm for Global Face Super-resolution via Transformer-CNN Aggregation Network

no code implementations16 Sep 2021 Yuanzhi Wang, Tao Lu, Yanduo Zhang, Junjun Jiang, JiaMing Wang, Zhongyuan Wang, Jiayi Ma

Recently, face super-resolution (FSR) methods either feed whole face image into convolutional neural networks (CNNs) or utilize extra facial priors (e. g., facial parsing maps, facial landmarks) to focus on facial structure, thereby maintaining the consistency of the facial structure while restoring facial details.

Face Reconstruction Super-Resolution

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

2 code implementations COLING 2022 Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, Songlin Hu

Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair.

Contrastive Learning Data Augmentation +5

CAT: Cross Attention in Vision Transformer

1 code implementation10 Jun 2021 Hezheng Lin, Xing Cheng, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Qing Song, Wei Yuan

In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.

Omniscient Video Super-Resolution

no code implementations ICCV 2021 Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Tao Lu, Xin Tian, Jiayi Ma

Most recent video super-resolution (SR) methods either adopt an iterative manner to deal with low-resolution (LR) frames from a temporally sliding window, or leverage the previously estimated SR output to help reconstruct the current frame recurrently.

Video Super-Resolution

Degrade is Upgrade: Learning Degradation for Low-light Image Enhancement

1 code implementation19 Mar 2021 Kui Jiang, Zhongyuan Wang, Zheng Wang, Chen Chen, Peng Yi, Tao Lu, Chia-Wen Lin

Different from existing methods tending to accomplish the relighting task directly by ignoring the fidelity and naturalness recovery, we investigate the intrinsic degradation and relight the low-light image while refining the details and color in two steps.

Low-Light Image Enhancement

Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

no code implementations CVPR 2021 Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang

Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.

Metric Learning for Anti-Compression Facial Forgery Detection

no code implementations15 Mar 2021 Shenhao Cao, Qin Zou, Xiuqing Mao, Zhongyuan Wang

Detecting facial forgery images and videos is an increasingly important topic in multimedia forensics.

Metric Learning

When Face Recognition Meets Occlusion: A New Benchmark

1 code implementation4 Mar 2021 Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Kangli Zeng, Zhen Han, Xin Tian, Yuhong Yang

In particular, we first collect a variety of glasses and masks as occlusion, and randomly combine the occlusion attributes (occlusion objects, textures, and colors) to achieve a large number of more realistic occlusion types.

Diversity Face Recognition

Converse, Focus and Guess -- Towards Multi-Document Driven Dialogue

1 code implementation4 Feb 2021 Han Liu, Caixia Yuan, Xiaojie Wang, Yushu Yang, Huixing Jiang, Zhongyuan Wang

We propose a novel task, Multi-Document Driven Dialogue (MD3), in which an agent can guess the target document that the user is interested in by leading a dialogue.

Attribute

Learn with Noisy Data via Unsupervised Loss Correction for Weakly Supervised Reading Comprehension

no code implementations COLING 2020 Xuemiao Zhang, Kun Zhou, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Junfei Liu

Weakly supervised machine reading comprehension (MRC) task is practical and promising for its easily available and massive training data, but inevitablely introduces noise.

Machine Reading Comprehension

Face Hallucination via Split-Attention in Split-Attention Network

1 code implementation22 Oct 2020 Tao Lu, Yuanzhi Wang, Yanduo Zhang, Yu Wang, Wei Liu, Zhongyuan Wang, Junjun Jiang

However, most of them fail to take into account the overall facial profile and fine texture details simultaneously, resulting in reduced naturalness and fidelity of the reconstructed face, and further impairing the performance of downstream tasks (e. g., face detection, facial recognition).

Face Detection Face Hallucination +4

Query-aware Tip Generation for Vertical Search

no code implementations19 Oct 2020 Yang Yang, Junmei Hao, Canjia Li, Zili Wang, Jingang Wang, Fuzheng Zhang, Rao Fu, Peixu Hou, Gong Zhang, Zhongyuan Wang

Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios.

Decision Making Decoder

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

1 code implementation1 Oct 2020 Zipeng Xu, Fangxiang Feng, Xiaojie Wang, Yushu Yang, Huixing Jiang, Zhongyuan Wang

In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states.

Question Generation Question-Generation +1

Leveraging Historical Interaction Data for Improving Conversational Recommender System

no code implementations19 Aug 2020 Kun Zhou, Wayne Xin Zhao, Hui Wang, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen

Most of the existing CRS methods focus on learning effective preference representations for users from conversation data alone.

Attribute Recommendation Systems

S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization

2 code implementations18 Aug 2020 Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen

To tackle this problem, we propose the model S^3-Rec, which stands for Self-Supervised learning for Sequential Recommendation, based on the self-attentive neural architecture.

Attribute Self-Supervised Learning +1

Learning Inverse Rendering of Faces from Real-world Videos

1 code implementation26 Mar 2020 Yuda Qiu, Zhangyang Xiong, Kai Han, Zhongyuan Wang, Zixiang Xiong, Xiaoguang Han

To alleviate this problem, we propose a weakly supervised training approach to train our model on real face videos, based on the assumption of consistency of albedo and normal across different frames, thus bridging the gap between real and synthetic face images.

Inverse Rendering

Multi-Scale Progressive Fusion Network for Single Image Deraining

3 code implementations CVPR 2020 Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Baojin Huang, Yimin Luo, Jiayi Ma, Junjun Jiang

In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal.

Single Image Deraining

Masked Face Recognition Dataset and Application

3 code implementations20 Mar 2020 Zhongyuan Wang, Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu, Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing Huang, Jinbi Liang

These datasets are freely available to industry and academia, based on which various applications on masked faces can be developed.

Face Detection Face Recognition

An End-to-End Network for Co-Saliency Detection in One Single Image

no code implementations25 Oct 2019 Yuanhao Yue, Qin Zou, Hongkai Yu, Qian Wang, Zhongyuan Wang, Song Wang

Co-saliency detection within a single image is a common vision problem that has received little attention and has not yet been well addressed.

Clustering Co-Salient Object Detection +2

Earlier Attention? Aspect-Aware LSTM for Aspect-Based Sentiment Analysis

no code implementations19 May 2019 Bowen Xing, Lejian Liao, Dandan song, Jingang Wang, Fuzheng Zhang, Zhongyuan Wang, He-Yan Huang

This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

SuperPCA: A Superpixelwise PCA Approach for Unsupervised Feature Extraction of Hyperspectral Imagery

1 code implementation26 Jun 2018 Junjun Jiang, Jiayi Ma, Chen Chen, Zhongyuan Wang, Zhihua Cai, Lizhe Wang

(1) Unlike the traditional PCA method based on a whole image, SuperPCA takes into account the diversity in different homogeneous regions, that is, different regions should have different projections.

Dimensionality Reduction General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.