Search Results for author: Liangliang Cao

Found 54 papers, 18 papers with code

Diffusion Model-Based Image Editing: A Survey

1 code implementation27 Feb 2024 Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao

In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field.

Denoising Image Inpainting +1

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

no code implementations13 Dec 2023 Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang

In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.

Ferret: Refer and Ground Anything Anywhere at Any Granularity

1 code implementation11 Oct 2023 Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Hallucination Language Modelling +1

Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

no code implementations4 Oct 2023 Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.

Image Generation Novel View Synthesis

Instruction-Following Speech Recognition

no code implementations18 Sep 2023 Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

1 code implementation8 May 2023 Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

no code implementations7 Oct 2021 Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

no code implementations25 Apr 2021 Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations6 Apr 2021 Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Spatial-Temporal Alignment Network for Action Recognition and Detection

no code implementations4 Dec 2020 Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann

The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.

Action Detection Action Recognition

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations22 Oct 2020 Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations7 May 2020 Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Large Scale Speech Sentiment Corpus

no code implementations LREC 2020 Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan

We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium.

Sentiment Analysis

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

no code implementations21 Nov 2019 Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task.

Sentiment Analysis

Product Image Recognition with Guidance Learning and Noisy Supervision

no code implementations26 Jul 2019 Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao

Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers.

Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

no code implementations25 Jul 2019 Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.

Lung Cancer Diagnosis Self-Supervised Learning +1

3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

no code implementations8 Jun 2019 Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.

Lung Cancer Diagnosis Specificity

Automatic adaptation of object detectors to new domains using self-training

1 code implementation CVPR 2019 Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller

Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.

Knowledge Distillation Pedestrian Detection +1

Learning Deterministic Policy with Target for Power Control in Wireless Networks

no code implementations21 Feb 2019 Yujiao Lu, Hancheng Lu, Liangliang Cao, Feng Wu, Daren Zhu

DRL-DPT overcomes the main obstacles in applying reinforcement learning and deep learning in wireless networks, i. e. continuous state space, continuous action space and convergence.

reinforcement-learning Reinforcement Learning (RL)

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

1 code implementation11 Aug 2018 Wei Tan, Shiyu Chang, Liana Fong, Cheng Li, Zijun Wang, Liangliang Cao

Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient.

Collaborative Filtering Data Compression

Focal Visual-Text Attention for Visual Question Answering

2 code implementations CVPR 2018 Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Memex Question Answering Question Answering +1

Improving Object Detection from Scratch via Gated Feature Reuse

2 code implementations4 Dec 2017 Zhiqiang Shen, Honghui Shi, Jiahui Yu, Hai Phan, Rogerio Feris, Liangliang Cao, Ding Liu, Xinchao Wang, Thomas Huang, Marios Savvides

In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i. e., without pre-trained models).

Object object-detection +1

Lip2AudSpec: Speech reconstruction from silent lip movements video

1 code implementation26 Oct 2017 Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.

Lip Reading

MemexQA: Visual Memex Question Answering

1 code implementation4 Aug 2017 Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Learning from Noisy Labels with Distillation

no code implementations ICCV 2017 Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li

The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain.

Image Based Appraisal of Real Estate Properties

no code implementations28 Nov 2016 Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo

Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.

Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data

no code implementations10 Aug 2016 Yuncheng Li, Liangliang Cao, Jiang Zhu, Jiebo Luo

The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data.

Detecting Sarcasm in Multimodal Social Platforms

no code implementations8 Aug 2016 Rossano Schifanella, Paloma de Juan, Joel Tetreault, Liangliang Cao

Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment.

Video2GIF: Automatic Generation of Animated GIFs from Video

1 code implementation CVPR 2016 Michael Gygli, Yale Song, Liangliang Cao

We introduce the novel problem of automatically generating animated GIFs from video.

GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring

1 code implementation12 Apr 2016 Wenying Ma, Liangliang Cao, Lei Yu, Guoping Long, Yucheng Li

We also applied GPU-FV for realtime video monitoring tasks and found that GPU-FV outperforms a number of previous works.


TGIF: A New Dataset and Benchmark on Animated GIF Description

1 code implementation CVPR 2016 Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo

The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.

Image Captioning Machine Translation +3

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

3 code implementations11 Mar 2016 Wei Tan, Liangliang Cao, Liana Fong

Matrix factorization (MF) is employed by many popular algorithms, e. g., collaborative filtering.

Distributed, Parallel, and Cluster Computing Performance

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games

no code implementations22 Sep 2015 Nikolai Yakovenko, Liangliang Cao, Colin Raffel, James Fan

The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a CNN based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.

Game of Poker

Medical Synonym Extraction with Concept Space Models

no code implementations1 Jun 2015 Chang Wang, Liangliang Cao, Bo-Wen Zhou

In this paper, we present a novel approach for medical synonym extraction.

Practice in Synonym Extraction at Large Scale

no code implementations6 Dec 2014 Liangliang Cao, Chang Wang

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications.

Question Answering

Efficient Maximum Appearance Search for Large-Scale Object Detection

no code implementations CVPR 2013 Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets.

Object object-detection +1

Learning Locally-Adaptive Decision Functions for Person Verification

no code implementations CVPR 2013 Zhen Li, Shiyu Chang, Feng Liang, Thomas S. Huang, Liangliang Cao, John R. Smith

This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule.

Face Verification Metric Learning +2

Designing Category-Level Attributes for Discriminative Visual Recognition

no code implementations CVPR 2013 Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

In this paper, we propose a novel formulation to automatically design discriminative "category-level attributes", which can be efficiently encoded by a compact category-attribute matrix.

Attribute Transfer Learning +1

Learning to Search Efficiently in High Dimensions

no code implementations NeurIPS 2011 Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang

Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees).

Computational Efficiency Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.