Search Results for author: Yuan Gong

Found 31 papers, 22 papers with code

Generic Knowledge Boosted Pre-training For Remote Sensing Images

1 code implementation9 Jan 2024 Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang

Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks.

Change Detection General Knowledge +4

Joint Audio and Speech Understanding

1 code implementation25 Sep 2023 Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Humans are surrounded by audio signals that include both speech and non-speech sounds.

ToonTalker: Cross-Domain Face Reenactment

no code implementations ICCV 2023 Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, Yujiu Yang

Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.

Face Reenactment Talking Face Generation

TaleCrafter: Interactive Story Visualization with Multiple Characters

1 code implementation29 May 2023 Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Story Visualization Text-to-Image Generation

SAIL: Search-Augmented Instruction Learning

no code implementations24 May 2023 Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Listen, Think, and Understand

1 code implementation18 May 2023 Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.

Ranked #3 on Music Question Answering on MusicQA (using extra training data)

Language Modelling Large Language Model +1

3D GAN Inversion with Facial Symmetry Prior

no code implementations CVPR 2023 Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang

It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion.

Image Reconstruction Neural Rendering

Contrastive Audio-Visual Masked Autoencoder

1 code implementation2 Oct 2022 Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

 Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

Rethinking Knowledge Distillation via Cross-Entropy

1 code implementation22 Aug 2022 Zhendong Yang, Zhe Li, Yuan Gong, Tianke Zhang, Shanshan Lao, Chun Yuan, Yu Li

Furthermore, we smooth students' target output to treat it as the soft target for training without teachers and propose a teacher-free new KD loss (tf-NKD).

Knowledge Distillation

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

1 code implementation6 May 2022 Yuan Gong, Jin Yu, James Glass

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.

Audio Classification

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

2 code implementations19 Apr 2022 Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception.

No-Reference Image Quality Assessment NR-IQA

Focal and Global Knowledge Distillation for Detectors

1 code implementation CVPR 2022 Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan

Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation.

Image Classification Knowledge Distillation +2

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations19 Oct 2021 Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Audio Classification Emotion Recognition +4

AST: Audio Spectrogram Transformer

3 code implementations5 Apr 2021 Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Audio Classification Audio Tagging +4

Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method

2 code implementations18 Mar 2020 Yuan Gong, Jian Yang, Christian Poellabauer

With the rapidly growing number of security-sensitive systems that use voice as the primary input, it becomes increasingly important to address these systems' potential vulnerability to replay attacks.

Second-order Non-local Attention Networks for Person Re-identification

no code implementations31 Aug 2019 Bryan, Xia, Yuan Gong, Yizhe Zhang, Christian Poellabauer

Recent efforts have shown promising results for person re-identification by designing part-based architectures to allow a neural network to learn discriminative representations from semantically coherent parts.

Person Re-Identification

Real-Time Adversarial Attacks

1 code implementation31 May 2019 Yuan Gong, Boyang Li, Christian Poellabauer, Yiyu Shi

In recent years, many efforts have demonstrated that modern machine learning algorithms are vulnerable to adversarial attacks, where small, but carefully crafted, perturbations on the input can make them fail.

Adversarial Attack BIG-bench Machine Learning

ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems

2 code implementations6 Apr 2019 Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer

This paper introduces a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs).

Voice Anti-spoofing

Towards Learning Fine-Grained Disentangled Representations from Speech

no code implementations8 Aug 2018 Yuan Gong, Christian Poellabauer

Learning disentangled representations of high-dimensional data is currently an active research area.

Representation Learning

Topic Modeling Based Multi-modal Depression Detection

no code implementations28 Mar 2018 Yuan Gong, Christian Poellabauer

Major depressive disorder is a common mental disorder that affects almost 7% of the adult U. S. population.

Depression Detection

An Overview of Vulnerabilities of Voice Controlled Systems

no code implementations24 Mar 2018 Yuan Gong, Christian Poellabauer

These systems have been shown to be vulnerable to various types of voice spoofing attacks.

General Classification

How do deep convolutional neural networks learn from raw audio waveforms?

no code implementations ICLR 2018 Yuan Gong, Christian Poellabauer

Prior work on speech and audio processing has demonstrated the ability to obtain excellent performance when learning directly from raw audio waveforms using convolutional neural networks (CNNs).

Crafting Adversarial Examples For Speech Paralinguistics Applications

no code implementations9 Nov 2017 Yuan Gong, Christian Poellabauer

Computational paralinguistic analysis is increasingly being used in a wide range of cyber applications, including security-sensitive applications such as speaker verification, deceptive speech detection, and medical diagnostics.

Medical Diagnosis Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.