Search Results for author: Yu Cheng

Found 158 papers, 76 papers with code

Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing

2 code implementations • 10 Apr 2018 • Jian Zhao, Jianshu Li, Yu Cheng, Li Zhou, Terence Sim, Shuicheng Yan, Jiashi Feng

Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification and autonomous driving, etc.

Ranked #1 on Multi-Human Parsing on PASCAL-Part

Autonomous Driving Clustering +6

4,982

Paper
Code

EnlightenGAN: Deep Light Enhancement without Paired Supervision

8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?

Ranked #1 on Low-Light Image Enhancement on AFLW (Zhang CVPR 2018 crops)

Generative Adversarial Network Image Restoration +1

1,361

Paper
Code

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations • ECCV 2020 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Ranked #3 on Visual Question Answering (VQA) on VCR (Q-A) test

Image-text matching Language Modelling +12

761

Paper
Code

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

2 code implementations • ICLR 2021 • Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.

Ranked #3 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Question Answering +1

381

Paper
Code

Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition

1 code implementation • 2 Sep 2018 • Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Haochong Lan, Fang Zhao, Lin Xiong, Yan Xu, Jianshu Li, Sugiri Pranata, ShengMei Shen, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng

Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.

Ranked #1 on Age-Invariant Face Recognition on MORPH Album2

Age-Invariant Face Recognition Benchmarking +4

360

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

343

Paper
Code

Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

2 code implementations • 5 Apr 2018 • Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou

The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details.

Ranked #2 on Scene Text Recognition on MSDA

Generative Adversarial Network Pedestrian Detection +1

324

Paper
Code

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

2 code implementations • ICLR 2020 • Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models.

Natural Language Understanding Overall - Test +1

250

Paper
Code

StoryGAN: A Sequential Conditional GAN for Story Visualization

1 code implementation • CVPR 2019 • Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao

We therefore propose a new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework.

Sentence Story Visualization +1

231

Paper
Code

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

3 code implementations • EMNLP 2020 • Linjie Li, Yen-Chun Chen, Yu Cheng, Zhe Gan, Licheng Yu, Jingjing Liu

We present HERO, a novel framework for large-scale video+language omni-representation learning.

Ranked #1 on Video Retrieval on TVR

Language Modelling Masked Language Modeling +8

226

Paper
Code

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

206

Paper
Code

Patient Knowledge Distillation for BERT Model Compression

3 code implementations • IJCNLP 2019 • Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.

Knowledge Distillation Model Compression

194

Paper
Code

MMD GAN: Towards Deeper Understanding of Moment Matching Network

2 code implementations • NeurIPS 2017 • Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabás Póczos

In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN.

Computational Efficiency Generative Adversarial Network

189

Paper
Code

Relation-Aware Graph Attention Network for Visual Question Answering

1 code implementation • ICCV 2019 • Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects.

Graph Attention Implicit Relations +3

175

Paper
Code

Discourse-Aware Neural Extractive Text Summarization

1 code implementation • ACL 2020 • Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

Recently BERT has been adopted for document encoding in state-of-the-art text summarization models.

Extractive Text Summarization Sentence

164

Paper
Code

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation • CVPR 2020 • Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

154

Paper
Code

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

1 code implementation • CVPR 2021 • Yu Cheng, Bo wang, Bo Yang, Robby T. Tan

Besides the integration of top-down and bottom-up networks, unlike existing pose discriminators that are designed solely for single person, and consequently cannot assess natural inter-person interactions, we propose a two-person pose discriminator that enforces natural two-person interactions.

Ranked #1 on 3D Multi-Person Pose Estimation (root-relative) on MuPoTS-3D

3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +2

154

Paper
Code

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

1 code implementation • 2 May 2022 • Yu Cheng, Bo wang, Robby T. Tan

Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i. e., the coordinates based on the center of the target person.

Ranked #1 on 3D Human Pose Estimation on JTA

3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +4

154

Paper
Code

Graph Optimal Transport for Cross-Domain Alignment

1 code implementation • ICML 2020 • Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu

In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph.

Graph Matching Image Captioning +8

149

Paper
Code

Distilling Knowledge Learned in BERT for Text Generation

1 code implementation • ACL 2020 • Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Language Modelling Machine Translation +5

130

Paper
Code

Diverse Few-Shot Text Classification with Multiple Metrics

2 code implementations • NAACL 2018 • Mo Yu, Xiaoxiao Guo, Jin-Feng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, Bo-Wen Zhou

We study few-shot learning in natural language domains.

Few-Shot Learning Few-Shot Text Classification +6

120

Paper
Code

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

2 code implementations • NeurIPS 2020 • Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Ranked #7 on Visual Entailment on SNLI-VE val (using extra training data)

Question Answering Referring Expression +7

118

Paper
Code

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

Diffusion models (DMs) have shown great potential for high-quality image synthesis.

Conditional Image Generation Denoising +1

111

Paper
Code

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos

1 code implementation • 22 Dec 2020 • Yu Cheng, Bo wang, Bo Yang, Robby T. Tan

To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that do not require camera parameters.

Ranked #1 on Root Joint Localization on Human3.6M

3D Absolute Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +5

Paper
Code

A Content-Driven Micro-Video Recommendation Dataset at Scale

1 code implementation • 27 Sep 2023 • Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, Fajie Yuan

Micro-videos have recently gained immense popularity, sparking critical research in micro-video recommendation with significant implications for the entertainment, advertising, and e-commerce industries.

Benchmarking Recommendation Systems +1

Paper
Code

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation • NeurIPS 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Ranked #20 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification

1 code implementation • ICCV 2017 • Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou

Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction.

Video-Based Person Re-Identification

Paper
Code

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

1 code implementation • CVPR 2020 • Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, Zhangyang Wang

We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins (eg, 3. 83% on robust accuracy and 1. 3% on standard accuracy, on the CIFAR-10 dataset), compared with the conventional end-to-end adversarial training baseline.

Adversarial Robustness

Paper
Code

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

1 code implementation • NeurIPS 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang

Training generative adversarial networks (GANs) with limited real image data generally results in deteriorated performance and collapsed models.

Data Augmentation

Paper
Code

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

1 code implementation • 26 Jul 2022 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Paper
Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

1 code implementation • 13 Sep 2023 • Yu Cheng, Yunzhu Pan, JiaQi Zhang, Yongxin Ni, Aixin Sun, Fajie Yuan

Then, to show the effectiveness of the dataset's image features, we substitute the itemID embeddings (from IDNet) with a powerful vision encoder that represents items using their raw image pixels.

Ranked #1 on Recommendation Systems on PixelRec

Benchmarking Recommendation Systems

Paper
Code

Domain Adaptive Text Style Transfer

1 code implementation • IJCNLP 2019 • Dianqi Li, Yizhe Zhang, Zhe Gan, Yu Cheng, Chris Brockett, Ming-Ting Sun, Bill Dolan

These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training.

Domain Adaptation Style Transfer +1

Paper
Code

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

1 code implementation • 26 Oct 2022 • Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task.

Multi-Task Learning

Paper
Code

Dialog-based Interactive Image Retrieval

1 code implementation • NeurIPS 2018 • Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio Schmidt Feris

Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

Image Retrieval reinforcement-learning +3

Paper
Code

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

1 code implementation • CVPR 2020 • Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu

We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.

Generative Adversarial Network Hallucination +4

Paper
Code

RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL

1 code implementation • 14 May 2022 • Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Yu Cheng, Chenghu Zhou, Xinbing Wang, Quanshi Zhang, Zhouhan Lin

Our model can incorporate almost all types of existing relations in the literature, and in addition, we propose introducing co-reference relations for the multi-turn scenario.

Ranked #1 on Dialogue State Tracking on CoSQL

Dialogue State Tracking Semantic Parsing +1

Paper
Code

Efficient Robust Training via Backward Smoothing

1 code implementation • 3 Oct 2020 • Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu

In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem.

Paper
Code

DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment

1 code implementation • CVPR 2023 • Heyuan Li, Bo wang, Yu Cheng, Mohan Kankanhalli, Robby T. Tan

Thanks to the proposed fusion module, our method is robust not only to occlusion and large pitch and roll view angles, which is the benefit of our image space approach, but also to noise and large yaw angles, which is the benefit of our model space method.

Ranked #1 on 3D Face Reconstruction on AFLW2000-3D (Mean NME metric)

3D Face Reconstruction Face Alignment +1

Paper
Code

Cross-Thought for Sentence Encoder Pre-training

1 code implementation • EMNLP 2020 • Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.

Information Retrieval Language Modelling +5

Paper
Code

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

1 code implementation • 2 Oct 2023 • Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.

Paper
Code

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

1 code implementation • 14 Sep 2023 • JiaQi Zhang, Yu Cheng, Yongxin Ni, Yunzhu Pan, Zheng Yuan, Junchen Fu, Youhua Li, Jie Wang, Fajie Yuan

The development of TransRec has encountered multiple challenges, among which the lack of large-scale, high-quality transfer learning recommendation dataset and benchmark suites is one of the biggest obstacles.

Descriptive Recommendation Systems +1

Paper
Code

Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling

1 code implementation • 19 Nov 2018 • Haoran You, Yu Cheng, Tianheng Cheng, Chunliang Li, Pan Zhou

We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo.

Image-to-Image Translation Semantic Segmentation +1

Paper
Code

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

1 code implementation • CVPR 2017 • Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris

Multi-task learning aims to improve generalization performance of multiple prediction tasks by appropriately sharing relevant information across them.

Attribute General Classification +1

Paper
Code

Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

1 code implementation • NAACL 2021 • Jason Wei, Chengyu Huang, Soroush Vosoughi, Yu Cheng, Shiqi Xu

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category.

Data Augmentation Few-Shot Text Classification +2

Paper
Code

Meta Module Network for Compositional Visual Reasoning

1 code implementation • 8 Oct 2019 • Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

MORPH Visual Reasoning

Paper
Code

Contrastive Distillation on Intermediate Representations for Language Model Compression

1 code implementation • EMNLP 2020 • Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.

Knowledge Distillation Language Modelling +1

Paper
Code

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

1 code implementation • ACL 2022 • Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren

Large pre-trained vision-language (VL) models can learn a new task with a handful of examples and generalize to a new task without fine-tuning.

Ranked #4 on Image Captioning on Flickr30k Captions test (SPICE metric)

Image Captioning Language Modelling +2

Paper
Code

Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding

1 code implementation • 9 Feb 2019 • Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, Fei Wang

One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, and sparse.

Paper
Code

INSET: Sentence Infilling with INter-SEntential Transformer

1 code implementation • ACL 2020 • Yichen Huang, Yizhe Zhang, Oussama Elachqar, Yu Cheng

Missing sentence generation (or sentence infilling) fosters a wide range of applications in natural language generation, such as document auto-completion and meeting note expansion.

Natural Language Understanding Sentence +1

Paper
Code

What Makes A Good Story? Designing Composite Rewards for Visual Storytelling

1 code implementation • 11 Sep 2019 • Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig

Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr.

Ranked #10 on Visual Storytelling on VIST

Visual Storytelling

Paper
Code

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

1 code implementation • CVPR 2022 • Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

However, a "head-to-toe assessment" regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field.

Paper
Code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Paper
Code

Adversarial Feature Augmentation and Normalization for Visual Recognition

1 code implementation • 22 Mar 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.

Classification Data Augmentation +2

Paper
Code

SemAttack: Natural Textual Attacks via Different Semantic Spaces

1 code implementation • Findings (NAACL) 2022 • Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li

In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.

Adversarial Text

Paper
Code

Generative Adversarial Networks as Variational Training of Energy Based Models

1 code implementation • 6 Nov 2016 • Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang

We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from.

Paper
Code

Backdoor Attacks on Crowd Counting

1 code implementation • 12 Jul 2022 • Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao

In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.

Backdoor Attack Crowd Counting +3

Paper
Code

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

1 code implementation • ACL 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.

Model Compression

Paper
Code

S3Pool: Pooling with Stochastic Spatial Sampling

4 code implementations • CVPR 2017 • Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris

We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e. g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e. g., top-left) manner.

Data Augmentation Image Classification

Paper
Code

The Elastic Lottery Ticket Hypothesis

1 code implementation • NeurIPS 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang

Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.

Paper
Code

Deep Structured Energy Based Models for Anomaly Detection

2 code implementations • 25 May 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures.

Anomaly Detection

Paper
Code

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

1 code implementation • 30 Oct 2021 • Xuxi Chen, Tianlong Chen, Weizhu Chen, Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng

To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.

Paper
Code

Sobolev GAN

2 code implementations • ICLR 2018 • Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng

We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CDF) of each coordinate on a leave one out basis.

Text Generation

Paper
Code

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

1 code implementation • 4 Nov 2021 • Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li

In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.

Ranked #1 on Adversarial Robustness on AdvGLUE

Adversarial Attack Adversarial Robustness +1

Paper
Code

Deep Multimodality Model for Multi-task Multi-view Learning

1 code implementation • 25 Jan 2019 • Lecheng Zheng, Yu Cheng, Jingrui He

However, there is no existing deep learning algorithm that jointly models task and view dual heterogeneity, particularly for a data set with multiple modalities (text and image mixed data set or text and video mixed data set, etc.).

General Classification Image Classification +1

Paper
Code

Improving Low-resource Prompt-based Relation Representation with Multi-view Decoupling Learning

1 code implementation • 26 Dec 2023 • Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen

Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks.

Relation Relation Extraction +1

Paper
Code

Deep Co-Attention Network for Multi-View Subspace Learning

1 code implementation • 15 Feb 2021 • Lecheng Zheng, Yu Cheng, Hongxia Yang, Nan Cao, Jingrui He

For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction.

Paper
Code

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

1 code implementation • 18 Mar 2024 • Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs).

Attribute reinforcement-learning +3

Paper
Code

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

1 code implementation • 21 Jun 2020 • Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein

Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives.

Image Classification Machine Translation +3

Paper
Code

Local Byte Fusion for Neural Machine Translation

1 code implementation • 23 May 2022 • Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, Junjie Hu

Subword tokenization schemes are the dominant technique used in current NLP models.

Domain Adaptation Machine Translation +2

Paper
Code

Outlier-Robust Sparse Estimation via Non-Convex Optimization

1 code implementation • 23 Sep 2021 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi

We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.

Paper
Code

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

1 code implementation • 24 Feb 2023 • Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes.

Dictionary Learning Self-Supervised Learning

Paper
Code

Non-Convex Matrix Completion Against a Semi-Random Adversary

no code implementations • 28 Mar 2018 • Yu Cheng, Rong Ge

Matrix completion is a well-studied problem with many machine learning applications.

Matrix Completion

Paper
Add Code

Deep Nearest Class Mean Model for Incremental Odor Classification

no code implementations • 8 Jan 2018 • Yu Cheng, Angus Wong, Kevin Hung, Zhizhong Li, Weitong Li, Jun Zhang

That is, the odor datasets are dynamically growing while both training samples and number of classes are increasing over time.

Classification General Classification

Paper
Add Code

A Survey of Model Compression and Acceleration for Deep Neural Networks

no code implementations • 23 Oct 2017 • Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang

Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.

Benchmarking Knowledge Distillation +2

Paper
Add Code

On the Distortion of Voting with Multiple Representative Candidates

no code implementations • 21 Nov 2017 • Yu Cheng, Shaddin Dughmi, David Kempe

Our main result is a clean and tight characterization of positional voting rules that have constant expected distortion (independent of the number of candidates and the metric space).

Paper
Add Code

Catching Anomalous Distributed Photovoltaics: An Edge-based Multi-modal Anomaly Detection

no code implementations • 26 Sep 2017 • Devu Manikantan Shilay, Kin Gwn Lorey, Tianshu Weiz, Teems Lovetty, Yu Cheng

A significant challenge in energy system cyber security is the current inability to detect cyber-physical attacks targeting and originating from distributed grid-edge devices such as photovoltaics (PV) panels, smart flexible loads, and electric vehicles.

Anomaly Detection Time Series Analysis

Paper
Add Code

Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records

no code implementations • 6 Sep 2017 • Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu

We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance.

Generative Adversarial Network

Paper
Add Code

Of the People: Voting Is More Effective with Representative Candidates

no code implementations • 4 May 2017 • Yu Cheng, Shaddin Dughmi, David Kempe

However, we show that independence alone is not enough to achieve the upper bound: even when candidates are drawn independently, if the population of candidates can be different from the voters, then an upper bound of $2$ on the approximation is tight.

Paper
Add Code

Exploiting Convolutional Neural Network for Risk Prediction with Medical Feature Embedding

no code implementations • 25 Jan 2017 • Zhengping Che, Yu Cheng, Zhaonan Sun, Yan Liu

To account for high dimensionality, we use the embedding medical features in the CNN model which hold the natural medical concepts.

Paper
Add Code

Doubly Convolutional Neural Networks

no code implementations • NeurIPS 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

Building large models with parameter sharing accounts for most of the success of deep convolutional neural networks (CNNs).

Image Classification

Paper
Add Code

Robust Learning of Fixed-Structure Bayesian Networks

1 code implementation • NeurIPS 2018 • Yu Cheng, Ilias Diakonikolas, Daniel Kane, Alistair Stewart

We investigate the problem of learning Bayesian networks in a robust model where an $\epsilon$-fraction of the samples are adversarially corrupted.

Paper
Code

Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data

no code implementations • CVPR 2016 • Jing Wang, Yu Cheng, Rogerio Schmidt Feris

These image pairs are then fed into a deep network that preserves similarity of images connected by the same track, in order to capture identity-related attribute features, and optimizes for location and weather prediction to capture additional facial attribute features.

Attribute Facial Attribute Classification +1

Paper
Add Code

An exploration of parameter redundancy in deep networks with circulant projections

no code implementations • ICCV 2015 • Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.

Paper
Add Code

Spectral Sparsification of Random-Walk Matrix Polynomials

no code implementations • 12 Feb 2015 • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng

Our work is particularly motivated by the algorithmic problems for speeding up the classic Newton's method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $q\geq1$) in a time-reversible Markov model.

Paper
Add Code

Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

no code implementations • 20 Oct 2014 • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng

random samples for $n$-dimensional Gaussian random fields with SDDM precision matrices.

Paper
Add Code

Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification

no code implementations • 16 Jul 2018 • Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou

We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem.

Video-Based Person Re-Identification

Paper
Add Code

High-Dimensional Robust Mean Estimation in Nearly-Linear Time

no code implementations • 23 Nov 2018 • Yu Cheng, Ilias Diakonikolas, Rong Ge

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.

Vocal Bursts Intensity Prediction

Paper
Add Code

Sequential Attention GAN for Interactive Image Editing

no code implementations • 20 Dec 2018 • Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao

The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the provided textual description; 2) step-by-step region-level modification to maintain visual consistency across the generated image sequence in each session.

Text-to-Image Generation

Paper
Add Code

On the Recursive Teaching Dimension of VC Classes

no code implementations • NeurIPS 2016 • Xi Chen, Yu Cheng, Bo Tang

This is the first upper bound for $RTD(C)$ that depends only on $VCD(C)$, independent of the size of the concept class $|C|$ and its~domain size $n$.

Paper
Add Code

Towards Pose Invariant Face Recognition in the Wild

no code implementations • CVPR 2018 • Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, Fang Zhao, Karlekar Jayashree, Sugiri Pranata, ShengMei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng

To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties.

Face Recognition Generative Adversarial Network +1

Paper
Add Code

Temporal Sequence Modeling for Video Event Detection

no code implementations • CVPR 2014 • Yu Cheng, Quanfu Fan, Sharath Pankanti, Alok Choudhary

Based on this idea, we represent a video by a sequence of visual words learnt from the video, and apply the Sequence Memoizer [21] to capture long-range dependencies in a temporal context in the visual sequence.

Event Detection General Classification

Paper
Add Code

Few-shot Learning with Meta Metric Learners

no code implementations • 26 Jan 2019 • Yu Cheng, Mo Yu, Xiaoxiao Guo, Bo-Wen Zhou

Our meta metric learning approach consists of task-specific learners, that exploit metric learning to handle flexible labels, and a meta learner, that discovers good parameters and gradient decent to specify the metrics in task-specific learners.

Few-Shot Learning Metric Learning

Paper
Add Code

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

no code implementations • ACL 2019 • Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao

This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image.

Question Answering Visual Dialog

Paper
Add Code

Reducing infrequent-token perplexity via variational corpora

no code implementations • IJCNLP 2015 • Yusheng Xie, Pranjal Daga, Yu Cheng, Kunpeng Zhang, Ankit Agrawal, Alok Choudhary

Language Modelling

Paper
Add Code

Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue

no code implementations • WS 2014 • Lanbo She, Shaohua Yang, Yu Cheng, Yunyi Jia, Joyce Chai, Ning Xi

Paper
Add Code

POP-CNN: Predicting Odor's Pleasantness with Convolutional Neural Network

no code implementations • 19 Mar 2019 • Danli Wu, Yu Cheng, Dehan Luo, Kin-Yeung Wong, Kevin Hung, Zhijing Yang

Predicting odor's pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry.

Paper
Add Code

A Hybrid Approach with Optimization and Metric-based Meta-Learner for Few-Shot Learning

no code implementations • 4 Apr 2019 • Duo Wang, Yu Cheng, Mo Yu, Xiaoxiao Guo, Tao Zhang

The task-specific classifiers are required to be homogeneous-structured to ease the parameter prediction, so the meta-learning approaches could only handle few-shot learning problems where the tasks share a uniform number of classes.

Few-Shot Learning General Classification +3

Paper
Add Code

Adversarial Category Alignment Network for Cross-domain Sentiment Classification

no code implementations • NAACL 2019 • Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou

Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.

Classification General Classification +2

Paper
Add Code

Faster Algorithms for High-Dimensional Robust Covariance Estimation

no code implementations • 11 Jun 2019 • Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

Vocal Bursts Intensity Prediction

Paper
Add Code

Mixed-Supervised Dual-Network for Medical Image Segmentation

no code implementations • 24 Jul 2019 • Duo Wang, Ming Li, Nir Ben-Shlomo, C. Eduardo Corrales, Yu Cheng, Tao Zhang, Jagadeesan Jayender

The model is trained jointly in a multi-task learning setting.

Image Segmentation Medical Image Segmentation +3

Paper
Add Code

Attend To Count: Crowd Counting with Adaptive Capacity Multi-scale CNNs

no code implementations • 7 Aug 2019 • Zhikang Zou, Yu Cheng, Xiaoye Qu, Shouling Ji, Xiaoxiao Guo, Pan Zhou

ACM-CNN consists of three types of modules: a coarse network, a fine network, and a smooth network.

Crowd Counting Density Estimation

Paper
Add Code

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

no code implementations • 11 Sep 2019 • Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin

Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains.

domain classification Unsupervised Domain Adaptation

Paper
Add Code

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

no code implementations • 14 Oct 2019 • Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou

In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.

Image Captioning

Paper
Add Code

Occlusion-Aware Networks for 3D Human Pose Estimation in Video

no code implementations • ICCV 2019 • Yu Cheng, Bo Yang, Bo Wang, Wending Yan, Robby T. Tan

In addition, we use this model to create a pose regularization constraint, preferring the 2D estimations of unreliable keypoints to be occluded.

Ranked #4 on 3D Human Pose Estimation on HumanEva-I

Monocular 3D Human Pose Estimation Optical Flow Estimation

Paper
Add Code

Distinguishing Distributions When Samples Are Strategically Transformed

no code implementations • NeurIPS 2019 • Hanrui Zhang, Yu Cheng, Vincent Conitzer

In other settings, the principal may not even be able to observe samples directly; instead, she must rely on signals that the agent is able to send based on the samples that he obtains, and he will choose these signals strategically.

Paper
Add Code

Towards Better Understanding of Disentangled Representations via Mutual Information

no code implementations • 25 Nov 2019 • Xiaojiang Yang, Wendong Bi, Yitong Sun, Yu Cheng, Junchi Yan

Most existing works on disentangled representation learning are solely built upon an marginal independence assumption: all factors in disentangled representations should be statistically independent.

Disentanglement Inductive Bias +1

Paper
Add Code

Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV based Random Access IoT Networks with NOMA

no code implementations • 31 Jan 2020 • Sami Khairy, Prasanna Balaprakash, Lin X. Cai, Yu Cheng

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers.

Management

Paper
Add Code

3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

no code implementations • AAAI Conference on Artificial Intelligence, AAAI 2020 2020 • Yu Cheng, Bo Yang, Bo wang, Robby T. Tan

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in recent years.

Ranked #3 on 3D Human Pose Estimation on HumanEva-I

Monocular 3D Human Pose Estimation valid

Paper
Add Code

Contextual Text Style Transfer

no code implementations • Findings of the Association for Computational Linguistics 2020 • Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li, Jingjing Liu

To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context.

Sentence Style Transfer +2

Paper
Add Code

APo-VAE: Text Generation in Hyperbolic Space

no code implementations • NAACL 2021 • Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, Jingjing Liu

In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.

Language Modelling Response Generation +1

Paper
Add Code

High-Dimensional Robust Mean Estimation via Gradient Descent

no code implementations • ICML 2020 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

LEMMA Vocal Bursts Intensity Prediction

Paper
Add Code

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

no code implementations • ECCV 2020 • Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu

To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e. g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e. g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings).

coreference-resolution

Paper
Add Code

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Paper
Add Code

Object Tracking using Spatio-Temporal Networks for Future Prediction Location

no code implementations • ECCV 2020 • Yuan Liu, Ruoteng Li, Yu Cheng, Robby T. Tan, Xiubao Sui

To facilitate the future prediction ability, we follow three key observations: 1) object motion trajectory is affected significantly by camera motion; 2) the past trajectory of an object can act as a salient cue to estimate the object motion in the spatial domain; 3) previous frames contain the surroundings and appearance of the target object, which is useful for predicting the target object’s future locations.

Future prediction Object +1

Paper
Add Code

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

no code implementations • 13 Sep 2020 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu

Transformer has become ubiquitous in the deep learning field.

Ranked #1 on Open-Domain Question Answering on SearchQA

Clustering Language Modelling +1

Paper
Add Code

Cluster-Former: Clustering-based Sparse Transformer for Question Answering

no code implementations • Findings (ACL) 2021 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu

Transformer has become ubiquitous in the deep learning field.

Clustering Question Answering

Paper
Add Code

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

no code implementations • 1 Jan 2021 • Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu

By incorporating different feature maps after the masking, we can distill better features to help model generalization.

Paper
Add Code

ALFA: Adversarial Feature Augmentation for Enhanced Image Recognition

no code implementations • 1 Jan 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Yu Hu, Zhangyang Wang, Jingjing Liu

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks.

Paper
Add Code

Multi-Fact Correction in Abstractive Text Summarization

no code implementations • EMNLP 2020 • Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.

Abstractive Text Summarization News Summarization +1

Paper
Add Code

Object Tracking Using Spatio-Temporal Future Prediction

no code implementations • 15 Oct 2020 • YuAn Liu, Ruoteng Li, Robby T. Tan, Yu Cheng, Xiubao Sui

Our trajectory prediction module predicts the target object's locations in the current and future frames based on the object's past trajectory.

Future prediction Object +2

Paper
Add Code

DSAM: A Distance Shrinking with Angular Marginalizing Loss for High Performance Vehicle Re-identificatio

no code implementations • 12 Nov 2020 • Jiangtao Kong, Yu Cheng, Benjia Zhou, Kai Li, Junliang Xing

To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.

Person Re-Identification Vehicle Re-Identification

Paper
Add Code

Light dark matter from dark sector decay

no code implementations • 3 Dec 2020 • Yu Cheng, Wei Liao

We find that the mass of the dark sector singlet fermion can be GeV scale or MeV scale and the interaction of the dark sector singlet fermion is very weak.

High Energy Physics - Phenomenology

Paper
Add Code

Fair for All: Best-effort Fairness Guarantees for Classification

no code implementations • 18 Dec 2020 • Anilesh K. Krishnaswamy, Zhihao Jiang, Kangning Wang, Yu Cheng, Kamesh Munagala

Instead, we propose a fairness notion whose guarantee, on each group $g$ in a class $\mathcal{G}$, is relative to the performance of the best classifier on $g$.

Classification Fairness +1

Paper
Add Code

Structure Of Flavor Changing Goldstone Boson Interactions

no code implementations • 15 Jan 2021 • Jin Sun, Yu Cheng, Xiao-Gang He

Or it may be the Majoron in models from lepton number violation in producing seesaw Majorana neutrino masses if the symmetry breaking scale is much higher than the electroweak scale.

High Energy Physics - Phenomenology

Paper
Add Code

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

no code implementations • CVPR 2021 • Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu

Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language.

Image-text matching Language Modelling +9

Paper
Add Code

CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning

no code implementations • 1 Apr 2021 • Luowei Zhou, Jingjing Liu, Yu Cheng, Zhe Gan, Lei Zhang

This work concerns video-language pre-training and representation learning.

Question Answering Representation Learning +5

Paper
Add Code

Automated Mechanism Design for Classification with Partial Verification

no code implementations • 12 Apr 2021 • Hanrui Zhang, Yu Cheng, Vincent Conitzer

We study the problem of automated mechanism design with partial verification, where each type can (mis)report only a restricted set of types (rather than any other type), induced by the principal's limited verification power.

Classification General Classification

Paper
Add Code

Playing Lottery Tickets with Vision and Language

no code implementations • 23 Apr 2021 • Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu

However, we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy.

Question Answering Referring Expression +6

Paper
Add Code

Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time

1 code implementation • ICLR 2021 • Yu Cheng, Honghao Lin

We achieve this by establishing a direct connection between robust learning of Bayesian networks and robust mean estimation.

Paper
Code

Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

no code implementations • CVPR 2021 • Yiting Li, Haiyue Zhu, Yu Cheng, Wenxin Wang, Chek Sing Teo, Cheng Xiang, Prahlad Vadakkepat, Tong Heng Lee

The failure modes of FSOD are investigated that the performance degradation is mainly due to the classification incapability (false positives), which motivates us to address it from a novel aspect of hard example mining.

Classification Few-Shot Object Detection +1

Paper
Add Code

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations • 29 Sep 2021 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Paper
Add Code

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

no code implementations • 16 Oct 2021 • Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah

Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks.

Knowledge Distillation Model Compression +1

Paper
Add Code

UNITER: Learning UNiversal Image-TExt Representations

no code implementations • 25 Sep 2019 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Language Modelling +10

Paper
Add Code

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

Paper
Add Code

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Paper
Add Code

ZOOMER: Boosting Retrieval on Web-scale Graphs by Regions of Interest

1 code implementation • 20 Mar 2022 • Yuezihan Jiang, Yu Cheng, Hanyu Zhao, Wentao Zhang, Xupeng Miao, Yu He, Liang Wang, Zhi Yang, Bin Cui

We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.

Retrieval

Paper
Code

Efficient Algorithms for Planning with Participation Constraints

no code implementations • 16 May 2022 • Hanrui Zhang, Yu Cheng, Vincent Conitzer

Our approach can also be extended to the (discounted) infinite-horizon case, for which we give an algorithm that runs in time polynomial in the size of the input and $\log(1/\varepsilon)$, and returns a policy that is optimal up to an additive error of $\varepsilon$.

Paper
Add Code

Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons

no code implementations • 25 Aug 2022 • Yu Cheng, Yihao Ai, Bo wang, Xinchao Wang, Robby T. Tan

In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons, and unlike the top-down methods, do not rely on human detection.

2D Pose Estimation Human Detection +1

Paper
Add Code

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations • 2 Jan 2023 • Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

Paper
Add Code

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Paper
Add Code

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng

This paper addresses the temporal sentence grounding (TSG).

Sentence Temporal Sentence Grounding

Paper
Add Code

A Theory of General Difference in Continuous and Discrete Domain

no code implementations • 14 May 2023 • Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart.

Paper
Add Code

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

no code implementations • CVPR 2023 • Zenghui Yuan, Pan Zhou, Kai Zou, Yu Cheng

Vision Transformers (ViTs), which made a splash in the field of computer vision (CV), have shaken the dominance of convolutional neural networks (CNNs).

Backdoor Attack

Paper
Add Code

Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights

no code implementations • 19 May 2023 • Ruyu Li, Wenhao Deng, Yu Cheng, Zheng Yuan, JiaQi Zhang, Fajie Yuan

Furthermore, we compare the performance of the TCF paradigm utilizing the most powerful LMs to the currently dominant ID embedding-based paradigm and investigate the transferability of this TCF paradigm.

Collaborative Filtering News Recommendation +1

Paper
Add Code

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

no code implementations • 24 May 2023 • Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren

Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.

Object Question Answering +2

Paper
Add Code

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

no code implementations • 15 Jun 2023 • Yunfan Li, Yiran Wang, Yu Cheng, Lin Yang

We show that, our algorithm obtains an $\varepsilon$-optimal policy with only $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^3})$ samples, where $\varepsilon$ is the suboptimality gap and $d$ is a complexity measure of the function class approximating the policy.

Reinforcement Learning (RL)

Paper
Add Code

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

no code implementations • NeurIPS 2023 • Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly.

Adversarial Robustness Ethics +1

Paper
Add Code

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

1 code implementation • 26 Aug 2023 • Jianqiang Xia, Dianxi Shi, Ke Song, Linna Song, Xiaolei Wang, Songchang Jin, Li Zhou, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao

With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities.

Ranked #1 on Rgb-T Tracking on GTOT

feature selection Rgb-T Tracking

Paper
Code

ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding

no code implementations • 21 Sep 2023 • Yu Cheng, Bo wang, Robby T. Tan

In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion, which is common in the wild videos.

Neural Rendering Novel View Synthesis +1

Paper
Add Code

ProS: Facial Omni-Representation Learning via Prototype-based Self-Distillation

no code implementations • 3 Nov 2023 • Xing Di, Yiyu Zheng, Xiaoming Liu, Yu Cheng

This paper presents a novel approach, called Prototype-based Self-Distillation (ProS), for unsupervised face representation learning.

Attribute Representation Learning

Paper
Add Code

Applications of Tao General Difference in Discrete Domain

no code implementations • 27 Jan 2024 • Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space.

Edge Detection

Paper
Add Code

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

no code implementations • 19 Feb 2024 • Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi

Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.

Contrastive Learning

Paper
Add Code

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning

no code implementations • 18 Feb 2024 • Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang

Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data.

Hallucination Visual Question Answering

Paper
Add Code

Multimodal Instruction Tuning with Conditional Mixture of LoRA

no code implementations • 24 Feb 2024 • Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.

Zero-shot Generalization

Paper
Add Code

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

no code implementations • NeurIPS 2023 • Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.

Paper
Add Code

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

no code implementations • 26 Mar 2024 • Wei Tao, Yucheng Zhou, Wenqiang Zhang, Yu Cheng

Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four kinds of agents customized for the software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents.

Code Generation

Paper
Add Code

Research on Detection of Floating Objects in River and Lake Based on AI Intelligent Image Recognition

no code implementations • 10 Apr 2024 • Jingyu Zhang, Ao Xiang, Yu Cheng, Qin Yang, Liyang Wang

With the rapid advancement of artificial intelligence technology, AI-enabled image recognition has emerged as a potent tool for addressing challenges in traditional environmental monitoring.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.