Search Results for author: Yu Cheng

Found 196 papers, 97 papers with code

Object Tracking using Spatio-Temporal Networks for Future Prediction Location

no code implementations ECCV 2020 Yuan Liu, Ruoteng Li, Yu Cheng, Robby T. Tan, Xiubao Sui

To facilitate the future prediction ability, we follow three key observations: 1) object motion trajectory is affected significantly by camera motion; 2) the past trajectory of an object can act as a salient cue to estimate the object motion in the spatial domain; 3) previous frames contain the surroundings and appearance of the target object, which is useful for predicting the target object’s future locations.

Future prediction Object +1

Continuous Speech Tokenizer in Text To Speech

no code implementations22 Oct 2024 Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, Zhanhui Kang

Our results show that the speech language model based on the continuous speech tokenizer has better continuity and higher estimated Mean Opinion Scores (MoS).

Language Modelling Text to Speech

RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

no code implementations11 Oct 2024 Jingyuan Qi, Zhiyang Xu, Rulin Shao, Yang Chen, Jin Di, Yu Cheng, Qifan Wang, Lifu Huang

In this work, we introduce RORA-VLM, a novel and robust retrieval augmentation framework specifically tailored for VLMs, with two key innovations: (1) a 2-stage retrieval process with image-anchored textual-query expansion to synergistically combine the visual and textual information in the query and retrieve the most relevant multimodal knowledge snippets; and (2) a robust retrieval augmentation method that strengthens the resilience of VLMs against irrelevant information in the retrieved multimodal knowledge by injecting adversarial noises into the retrieval-augmented training process, and filters out extraneous visual information, such as unrelated entities presented in images, via a query-oriented visual token refinement strategy.

Retrieval

What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs

no code implementations7 Oct 2024 Shu Yang, Shenzhe Zhu, Ruoxuan Bao, Liang Liu, Yu Cheng, Lijie Hu, Mengdi Li, Di Wang

Large language models (LLMs) have demonstrated remarkable capabilities in generating human-like text and exhibiting personality traits similar to those in humans.

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

1 code implementation4 Oct 2024 Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang

Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding.

Dense Video Captioning Sentence +1

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

1 code implementation28 Sep 2024 Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng

In recent years, Contrastive Language-Image Pre-training (CLIP) has become a cornerstone in multimodal intelligence.

Image Classification Large Language Model +2

Research on Dynamic Data Flow Anomaly Detection based on Machine Learning

no code implementations23 Sep 2024 Liyang Wang, Yu Cheng, Hao Gong, Jiacheng Hu, Xirui Tang, Iris Li

The sophistication and diversity of contemporary cyberattacks have rendered the use of proxies, gateways, firewalls, and encrypted tunnels as a standalone defensive strategy inadequate.

Anomaly Detection Clustering +1

SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information

1 code implementation21 Sep 2024 Jiashuo Sun, Jihai Zhang, Yucheng Zhou, Zhaochen Su, Xiaoye Qu, Yu Cheng

To address these challenges, we propose a self-refinement framework designed to teach LVLMs to Selectively Utilize Retrieved Information (SURf).

RAG

Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

1 code implementation30 Aug 2024 Xiaoye Qu, Jiashuo Sun, Wei Wei, Yu Cheng

By fully grasping the information in the image and carefully considering the certainty of the potential answers when decoding, our MVP can effectively reduce hallucinations in LVLMs. The extensive experiments verify that our proposed MVP significantly mitigates the hallucination problem across four well-known LVLMs.

Hallucination

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

1 code implementation22 Aug 2024 Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng

Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge.

Misinformation

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10

no code implementations9 Aug 2024 Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang

As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e. g., Graphcore IPU).

Mitigating Multilingual Hallucination in Large Vision-Language Models

1 code implementation1 Aug 2024 Xiaoye Qu, Mingyang Song, Wei Wei, Jianfeng Dong, Yu Cheng

In this paper, we make the first attempt to mitigate this important multilingual hallucination in LVLMs.

Hallucination

Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets

no code implementations28 Jul 2024 Liyang Wang, Yu Cheng, Xingxin Gu, Zhizhong Wu

With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions.

Management

Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

1 code implementation22 Jul 2024 Yihao Ai, Yifei Qi, Bo wang, Yu Cheng, Xinchao Wang, Robby T. Tan

Our primary novelty lies in leveraging two complementary-teacher networks to generate more reliable pseudo labels, enabling our model achieves competitive performance on extremely low-light images without the need for training with low-light ground truths.

2D Human Pose Estimation Pose Estimation

On the Universal Truthfulness Hyperplane Inside LLMs

no code implementations11 Jul 2024 Junteng Liu, Shiqi Chen, Yu Cheng, Junxian He

In this work, we investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model.

Diversity Domain Generalization +1

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

1 code implementation10 Jul 2024 Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing.

Data Poisoning

Advanced Financial Fraud Detection Using GNN-CL Model

no code implementations9 Jul 2024 Yu Cheng, Junjie Guo, Shiqing Long, You Wu, Mengfang Sun, Rong Zhang

The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection by synergistically combining the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks.

Fraud Detection

Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations

no code implementations4 Jul 2024 Zhiyang Xu, Minqian Liu, Ying Shen, Joy Rimchala, Jiaxin Zhang, Qifan Wang, Yu Cheng, Lifu Huang

Lateralization LoRA employs a hybrid approach, combining the traditional linear LoRA and a Convolutional LoRA for generating text and images, enabling the generation of high-quality text and images by leveraging modality-specific structures and parameter sets.

Attribute Image Generation

Direct Preference Knowledge Distillation for Large Language Models

no code implementations28 Jun 2024 Yixing Li, Yuxian Gu, Li Dong, Dequan Wang, Yu Cheng, Furu Wei

Meanwhile, we prove the value and effectiveness of the introduced implicit reward and output preference in KD through experiments and theoretical analysis.

Knowledge Distillation

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

2 code implementations24 Jun 2024 Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng

Motivated by this limit, we investigate building MoE models from existing dense large language models.

Timo: Towards Better Temporal Reasoning for Language Models

1 code implementation20 Jun 2024 Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng

Therefore, we propose a crucial question: Can we build a universal framework to handle a variety of temporal reasoning tasks?

Question Answering

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

1 code implementation17 Jun 2024 Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

1 code implementation17 Jun 2024 Guanjie Chen, Xinyu Zhao, Tianlong Chen, Yu Cheng

Motivated by the research gap and counter-intuitive phenomenon, we propose $\texttt{MoE-RBench}$, the first comprehensive assessment of SMoE reliability from three aspects: $\textit{(i)}$ safety and hallucination, $\textit{(ii)}$ resilience to adversarial attacks, and $\textit{(iii)}$ out-of-distribution robustness.

Hallucination

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

no code implementations17 Jun 2024 Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

\thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?}

In-Context Learning Task Arithmetic +1

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

1 code implementation17 Jun 2024 Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales.

Bayesian Networks and Machine Learning for COVID-19 Severity Explanation and Demographic Symptom Classification

1 code implementation16 Jun 2024 Oluwaseun T. Ajayi, Yu Cheng

With the prevailing efforts to combat the coronavirus disease 2019 (COVID-19) pandemic, there are still uncertainties that are yet to be discovered about its spread, future impact, and resurgence.

Clustering

Application of Natural Language Processing in Financial Risk Detection

no code implementations14 Jun 2024 Liyang Wang, Yu Cheng, Ao Xiang, Jingyu Zhang, Haowei Yang

This study offers valuable references for the field of financial risk management, utilizing advanced NLP techniques to improve the accuracy and efficiency of financial risk detection.

Management

Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

no code implementations14 Jun 2024 Haowei Yang, Liyang Wang, Jingyu Zhang, Yu Cheng, Ao Xiang

With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain mapping, the importance of edge detection in LiDAR images has become increasingly prominent.

Autonomous Driving Computational Efficiency +2

Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark

1 code implementation12 Jun 2024 Pingzhi Li, Xiaolong Jin, Yu Cheng, Tianlong Chen

Large Language Models~(LLMs) have become foundational in the realm of natural language processing, demonstrating performance improvements as model sizes increase.

Benchmarking Model Compression +1

Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

1 code implementation11 Jun 2024 Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng Xie, Dangyang Chen

Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias.

text-classification Text Classification

A Survey of Fragile Model Watermarking

no code implementations7 Jun 2024 Zhenzhe Gao, Yu Cheng, Zhaoxia Yin

Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years.

Autonomous Driving Survey

Efficient User Sequence Learning for Online Services via Compressed Graph Neural Networks

1 code implementation5 Jun 2024 Yucheng Wu, Liyue Chen, Yu Cheng, Shuai Chen, Jinyu Xu, Leye Wang

Learning representations of user behavior sequences is crucial for various online services, such as online fraudulent transaction detection mechanisms.

Representation Learning

Optimization of Worker Scheduling at Logistics Depots Using Genetic Algorithms and Simulated Annealing

no code implementations20 May 2024 Jinxin Xu, Haixin Wu, Yu Cheng, Liyang Wang, Xin Yang, Xintong Fu, Yuelong Su

This paper addresses the optimization of scheduling for workers at a logistics depot using a combination of genetic algorithm and simulated annealing algorithm.

Scheduling

Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

no code implementations17 May 2024 Yu Cheng, Qin Yang, Liyang Wang, Ao Xiang, Jingyu Zhang

In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability.

Management

Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

no code implementations25 Apr 2024 Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues.

Image Manipulation

Research on Detection of Floating Objects in River and Lake Based on AI Intelligent Image Recognition

no code implementations10 Apr 2024 Jingyu Zhang, Ao Xiang, Yu Cheng, Qin Yang, Liyang Wang

With the rapid advancement of artificial intelligence technology, AI-enabled image recognition has emerged as a potent tool for addressing challenges in traditional environmental monitoring.

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

no code implementations26 Mar 2024 Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, Yu Cheng

To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors.

GitHub issue resolution

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

1 code implementation18 Mar 2024 Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs).

Attribute reinforcement-learning +4

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

no code implementations NeurIPS 2023 Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.

Multimodal Instruction Tuning with Conditional Mixture of LoRA

no code implementations24 Feb 2024 Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.

parameter-efficient fine-tuning Zero-shot Generalization

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

1 code implementation19 Feb 2024 Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi

Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.

Contrastive Learning

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning

no code implementations18 Feb 2024 Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang

Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data.

Hallucination Visual Question Answering

Applications of Tao General Difference in Discrete Domain

no code implementations27 Jan 2024 Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space.

Edge Detection

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

1 code implementation CVPR 2024 Tao Wang, Lei Jin, Zheng Wang, Jianshu Li, Liang Li, Fang Zhao, Yu Cheng, Li Yuan, Li Zhou, Junliang Xing, Jian Zhao

To leverage this quality information we propose a motion refinement network termed SynSP to achieve a Synergy of Smoothness and Precision in the sequence refinement tasks.

Enhancing Low-Resource Relation Representations through Multi-View Decoupling

1 code implementation26 Dec 2023 Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen

Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks.

Relation Relation Extraction +1

ProS: Facial Omni-Representation Learning via Prototype-based Self-Distillation

no code implementations3 Nov 2023 Xing Di, Yiyu Zheng, Xiaoming Liu, Yu Cheng

This paper presents a novel approach, called Prototype-based Self-Distillation (ProS), for unsupervised face representation learning.

Attribute Representation Learning

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

1 code implementation2 Oct 2023 Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.

A Content-Driven Micro-Video Recommendation Dataset at Scale

1 code implementation27 Sep 2023 Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, Fajie Yuan

Micro-videos have recently gained immense popularity, sparking critical research in micro-video recommendation with significant implications for the entertainment, advertising, and e-commerce industries.

Benchmarking Recommendation Systems +1

ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding

no code implementations21 Sep 2023 Yu Cheng, Bo wang, Robby T. Tan

In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion, which is common in the wild videos.

Neural Rendering Novel View Synthesis +1

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

2 code implementations14 Sep 2023 JiaQi Zhang, Yu Cheng, Yongxin Ni, Yunzhu Pan, Zheng Yuan, Junchen Fu, Youhua Li, Jie Wang, Fajie Yuan

The development of TransRec has encountered multiple challenges, among which the lack of large-scale, high-quality transfer learning recommendation dataset and benchmark suites is one of the biggest obstacles.

Descriptive Recommendation Systems +1

An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

1 code implementation13 Sep 2023 Yu Cheng, Yunzhu Pan, JiaQi Zhang, Yongxin Ni, Aixin Sun, Fajie Yuan

Then, to show the effectiveness of the dataset's image features, we substitute the itemID embeddings (from IDNet) with a powerful vision encoder that represents items using their raw image pixels.

Benchmarking Recommendation Systems

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

no code implementations NeurIPS 2023 Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly.

Adversarial Robustness Ethics +1

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

no code implementations15 Jun 2023 Yunfan Li, Yiran Wang, Yu Cheng, Lin Yang

We show that, our algorithm obtains an $\varepsilon$-optimal policy with only $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^3})$ samples, where $\varepsilon$ is the suboptimality gap and $d$ is a complexity measure of the function class approximating the policy.

Reinforcement Learning (RL)

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

no code implementations24 May 2023 Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren

Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.

Object Question Answering +2

DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment

1 code implementation CVPR 2023 Heyuan Li, Bo wang, Yu Cheng, Mohan Kankanhalli, Robby T. Tan

Thanks to the proposed fusion module, our method is robust not only to occlusion and large pitch and roll view angles, which is the benefit of our image space approach, but also to noise and large yaw angles, which is the benefit of our model space method.

 Ranked #1 on 3D Face Reconstruction on AFLW2000-3D (Mean NME metric)

3D Face Reconstruction Face Alignment +1

Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights

no code implementations19 May 2023 Ruyu Li, Wenhao Deng, Yu Cheng, Zheng Yuan, JiaQi Zhang, Fajie Yuan

Furthermore, we compare the performance of the TCF paradigm utilizing the most powerful LMs to the currently dominant ID embedding-based paradigm and investigate the transferability of this TCF paradigm.

Collaborative Filtering News Recommendation +1

A Theory of General Difference in Continuous and Discrete Domain

no code implementations14 May 2023 Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart.

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations18 Mar 2023 Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

parameter-efficient fine-tuning Question Answering +1

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

1 code implementation24 Feb 2023 Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes.

Dictionary Learning Self-Supervised Learning

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations5 Jan 2023 Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations2 Jan 2023 Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

no code implementations CVPR 2023 Zenghui Yuan, Pan Zhou, Kai Zou, Yu Cheng

Vision Transformers (ViTs), which made a splash in the field of computer vision (CV), have shaken the dominance of convolutional neural networks (CNNs).

Backdoor Attack

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

1 code implementation26 Oct 2022 Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task.

Multi-Task Learning

Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons

no code implementations25 Aug 2022 Yu Cheng, Yihao Ai, Bo wang, Xinchao Wang, Robby T. Tan

In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons, and unlike the top-down methods, do not rely on human detection.

2D Pose Estimation Human Detection +1

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

1 code implementation26 Jul 2022 Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Backdoor Attacks on Crowd Counting

1 code implementation12 Jul 2022 Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao

In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.

Backdoor Attack Crowd Counting +3

Efficient Algorithms for Planning with Participation Constraints

no code implementations16 May 2022 Hanrui Zhang, Yu Cheng, Vincent Conitzer

Our approach can also be extended to the (discounted) infinite-horizon case, for which we give an algorithm that runs in time polynomial in the size of the input and $\log(1/\varepsilon)$, and returns a policy that is optimal up to an additive error of $\varepsilon$.

RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL

1 code implementation14 May 2022 Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Yu Cheng, Chenghu Zhou, Xinbing Wang, Quanshi Zhang, Zhouhan Lin

Our model can incorporate almost all types of existing relations in the literature, and in addition, we propose introducing co-reference relations for the multi-turn scenario.

Dialogue State Tracking Text-To-SQL

SemAttack: Natural Textual Attacks via Different Semantic Spaces

1 code implementation Findings (NAACL) 2022 Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li

In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.

Adversarial Text

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

1 code implementation2 May 2022 Yu Cheng, Bo wang, Robby T. Tan

Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i. e., the coordinates based on the center of the target person.

3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +4

ZOOMER: Boosting Retrieval on Web-scale Graphs by Regions of Interest

1 code implementation20 Mar 2022 Yuezihan Jiang, Yu Cheng, Hanyu Zhao, Wentao Zhang, Xupeng Miao, Yu He, Liang Wang, Zhi Yang, Bin Cui

We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.

Retrieval

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

1 code implementation CVPR 2022 Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

However, a "head-to-toe assessment" regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field.

Diversity

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations14 Jan 2022 Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations3 Jan 2022 Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

1 code implementation4 Nov 2021 Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li

In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.

Adversarial Attack Adversarial Robustness +1

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

1 code implementation30 Oct 2021 Xuxi Chen, Tianlong Chen, Weizhu Chen, Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng

To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.

parameter-efficient fine-tuning

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

no code implementations16 Oct 2021 Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah

Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks.

Knowledge Distillation Model Compression +1

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations29 Sep 2021 Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Outlier-Robust Sparse Estimation via Non-Convex Optimization

1 code implementation23 Sep 2021 Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi

We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.

Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

no code implementations CVPR 2021 Yiting Li, Haiyue Zhu, Yu Cheng, Wenxin Wang, Chek Sing Teo, Cheng Xiang, Prahlad Vadakkepat, Tong Heng Lee

The failure modes of FSOD are investigated that the performance degradation is mainly due to the classification incapability (false positives), which motivates us to address it from a novel aspect of hard example mining.

Classification Few-Shot Object Detection +1

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Efficient ViTs

Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time

1 code implementation ICLR 2021 Yu Cheng, Honghao Lin

We achieve this by establishing a direct connection between robust learning of Bayesian networks and robust mean estimation.

Automated Mechanism Design for Classification with Partial Verification

no code implementations12 Apr 2021 Hanrui Zhang, Yu Cheng, Vincent Conitzer

We study the problem of automated mechanism design with partial verification, where each type can (mis)report only a restricted set of types (rather than any other type), induced by the principal's limited verification power.

Classification General Classification

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

1 code implementation CVPR 2021 Yu Cheng, Bo wang, Bo Yang, Robby T. Tan

Besides the integration of top-down and bottom-up networks, unlike existing pose discriminators that are designed solely for single person, and consequently cannot assess natural inter-person interactions, we propose a two-person pose discriminator that enforces natural two-person interactions.

3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +2

The Elastic Lottery Ticket Hypothesis

1 code implementation NeurIPS 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang

Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation CVPR 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Adversarial Feature Augmentation and Normalization for Visual Recognition

1 code implementation22 Mar 2021 Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.

Classification Data Augmentation +2

Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

1 code implementation NAACL 2021 Jason Wei, Chengyu Huang, Soroush Vosoughi, Yu Cheng, Shiqi Xu

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category.

Data Augmentation Few-Shot Text Classification +3

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang

Training generative adversarial networks (GANs) with limited real image data generally results in deteriorated performance and collapsed models.

Data Augmentation

Deep Co-Attention Network for Multi-View Subspace Learning

1 code implementation15 Feb 2021 Lecheng Zheng, Yu Cheng, Hongxia Yang, Nan Cao, Jingrui He

For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction.

Structure Of Flavor Changing Goldstone Boson Interactions

no code implementations15 Jan 2021 Jin Sun, Yu Cheng, Xiao-Gang He

Or it may be the Majoron in models from lepton number violation in producing seesaw Majorana neutrino masses if the symmetry breaking scale is much higher than the electroweak scale.

High Energy Physics - Phenomenology

ALFA: Adversarial Feature Augmentation for Enhanced Image Recognition

no code implementations1 Jan 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Yu Hu, Zhangyang Wang, Jingjing Liu

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks.

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

no code implementations1 Jan 2021 Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu

By incorporating different feature maps after the masking, we can distill better features to help model generalization.

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

1 code implementation ACL 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.

Model Compression

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos

1 code implementation22 Dec 2020 Yu Cheng, Bo wang, Bo Yang, Robby T. Tan

To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that do not require camera parameters.

3D Absolute Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +5

Fair for All: Best-effort Fairness Guarantees for Classification

no code implementations18 Dec 2020 Anilesh K. Krishnaswamy, Zhihao Jiang, Kangning Wang, Yu Cheng, Kamesh Munagala

Instead, we propose a fairness notion whose guarantee, on each group $g$ in a class $\mathcal{G}$, is relative to the performance of the best classifier on $g$.

Classification Fairness +1

Light dark matter from dark sector decay

no code implementations3 Dec 2020 Yu Cheng, Wei Liao

We find that the mass of the dark sector singlet fermion can be GeV scale or MeV scale and the interaction of the dark sector singlet fermion is very weak.

High Energy Physics - Phenomenology

DSAM: A Distance Shrinking with Angular Marginalizing Loss for High Performance Vehicle Re-identificatio

no code implementations12 Nov 2020 Jiangtao Kong, Yu Cheng, Benjia Zhou, Kai Li, Junliang Xing

To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.

Person Re-Identification Vehicle Re-Identification

Object Tracking Using Spatio-Temporal Future Prediction

no code implementations15 Oct 2020 YuAn Liu, Ruoteng Li, Robby T. Tan, Yu Cheng, Xiubao Sui

Our trajectory prediction module predicts the target object's locations in the current and future frames based on the object's past trajectory.

Future prediction Object +2

Cross-Thought for Sentence Encoder Pre-training

1 code implementation EMNLP 2020 Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.

Information Retrieval Language Modelling +5

Multi-Fact Correction in Abstractive Text Summarization

no code implementations EMNLP 2020 Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.

Abstractive Text Summarization News Summarization +1

Efficient Robust Training via Backward Smoothing

1 code implementation3 Oct 2020 Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu

In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem.

Contrastive Distillation on Intermediate Representations for Language Model Compression

1 code implementation EMNLP 2020 Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.

Knowledge Distillation Language Modelling +1

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations6 Aug 2020 Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Graph Optimal Transport for Cross-Domain Alignment

1 code implementation ICML 2020 Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu

In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph.

Graph Matching Image Captioning +8

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

1 code implementation21 Jun 2020 Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein

Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives.

Image Classification Machine Translation +3

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

2 code implementations NeurIPS 2020 Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Ranked #7 on Visual Entailment on SNLI-VE val (using extra training data)

Image-text Retrieval Question Answering +7

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

no code implementations ECCV 2020 Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu

To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e. g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e. g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings).

coreference-resolution cross-modal alignment

High-Dimensional Robust Mean Estimation via Gradient Descent

no code implementations ICML 2020 Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

LEMMA Vocal Bursts Intensity Prediction

Contextual Text Style Transfer

no code implementations Findings of the Association for Computational Linguistics 2020 Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li, Jingjing Liu

To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context.

Sentence Style Transfer +2

APo-VAE: Text Generation in Hyperbolic Space

no code implementations NAACL 2021 Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, Jingjing Liu

In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.

Language Modelling Response Generation +1

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

1 code implementation CVPR 2020 Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, Zhangyang Wang

We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins (eg, 3. 83% on robust accuracy and 1. 3% on standard accuracy, on the CIFAR-10 dataset), compared with the conventional end-to-end adversarial training baseline.

Adversarial Robustness

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

1 code implementation CVPR 2020 Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu

We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.

Generative Adversarial Network Hallucination +4

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation CVPR 2020 Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV based Random Access IoT Networks with NOMA

no code implementations31 Jan 2020 Sami Khairy, Prasanna Balaprakash, Lin X. Cai, Yu Cheng

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers.

Management Reinforcement Learning

Distinguishing Distributions When Samples Are Strategically Transformed

no code implementations NeurIPS 2019 Hanrui Zhang, Yu Cheng, Vincent Conitzer

In other settings, the principal may not even be able to observe samples directly; instead, she must rely on signals that the agent is able to send based on the samples that he obtains, and he will choose these signals strategically.

Towards Better Understanding of Disentangled Representations via Mutual Information

no code implementations25 Nov 2019 Xiaojiang Yang, Wendong Bi, Yitong Sun, Yu Cheng, Junchi Yan

Most existing works on disentangled representation learning are solely built upon an marginal independence assumption: all factors in disentangled representations should be statistically independent.

Disentanglement Inductive Bias +1

INSET: Sentence Infilling with INter-SEntential Transformer

1 code implementation ACL 2020 Yichen Huang, Yizhe Zhang, Oussama Elachqar, Yu Cheng

Missing sentence generation (or sentence infilling) fosters a wide range of applications in natural language generation, such as document auto-completion and meeting note expansion.

Natural Language Understanding Sentence +1

Distilling Knowledge Learned in BERT for Text Generation

2 code implementations ACL 2020 Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Language Modelling Machine Translation +5

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

no code implementations14 Oct 2019 Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou

In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.

Decoder Image Captioning

Meta Module Network for Compositional Visual Reasoning

1 code implementation8 Oct 2019 Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

MORPH Visual Reasoning

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

2 code implementations ICLR 2020 Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models.

ARC Natural Language Understanding +2

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations ECCV 2020 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Image-text matching Image-text Retrieval +12

UNITER: Learning UNiversal Image-TExt Representations

no code implementations25 Sep 2019 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Image-text Retrieval +10

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

no code implementations11 Sep 2019 Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin

Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains.

domain classification Unsupervised Domain Adaptation

Patient Knowledge Distillation for BERT Model Compression

3 code implementations IJCNLP 2019 Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.

Knowledge Distillation Model Compression

199