Search Results for author: Liangliang Cao

Found 54 papers, 18 papers with code

Diffusion Model-Based Image Editing: A Survey

1 code implementation • 27 Feb 2024 • Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao

In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field.

Denoising Image Inpainting +1

301

Paper
Code

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

no code implementations • 13 Dec 2023 • Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang

In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.

Paper
Add Code

Ferret: Refer and Ground Anything Anywhere at Any Granularity

1 code implementation • 11 Oct 2023 • Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Hallucination Language Modelling +1

7,930

Paper
Code

Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

no code implementations • 4 Oct 2023 • Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.

Image Generation Novel View Synthesis

Paper
Add Code

Instruction-Following Speech Recognition

no code implementations • 18 Sep 2023 • Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

no code implementations • 18 May 2023 • Liangchen Song, Liangliang Cao, Hongyu Xu, Kai Kang, Feng Tang, Junsong Yuan, Yang Zhao

The proposed framework consists of two significant components: Geometry Guided Diffusion and Mesh Optimization.

Image Generation Indoor Scene Synthesis

Paper
Add Code

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

1 code implementation • 8 May 2023 • Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

928

Paper
Code

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.

Information Retrieval Retrieval +1

Paper
Add Code

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

no code implementations • 29 Nov 2022 • Taihong Xiao, ZiRui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang

Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.

Classification Few-Shot Image Classification

Paper
Add Code

PriFit: Learning to Fit Primitives Improves Few Shot Point Cloud Segmentation

1 code implementation • 27 Dec 2021 • Gopal Sharma, Bidya Dash, Aruni RoyChowdhury, Matheus Gadelha, Marios Loizou, Liangliang Cao, Rui Wang, Erik Learned-Miller, Subhransu Maji, Evangelos Kalogerakis

We present PriFit, a semi-supervised approach for label-efficient learning of 3D point cloud segmentation networks.

Few-Shot Learning Point Cloud Segmentation +2

Paper
Code

Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition

no code implementations • 8 Oct 2021 • Zhiyun Lu, Yanwei Pan, Thibault Doutre, Parisa Haghani, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman

Our experiments show that for both losses, the WER on long-form speech reduces substantially as the training utterance length increases.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.

Ranked #1 on Speech Recognition on Common Voice

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

no code implementations • 26 Apr 2021 • David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

no code implementations • 25 Apr 2021 • Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations • 6 Apr 2021 • Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Residual Energy-Based Models for End-to-End Speech Recognition

no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland

End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Word-Level Confidence For Subword End-to-End ASR

no code implementations • 11 Mar 2021 • David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw

We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Spatial-Temporal Alignment Network for Action Recognition and Detection

no code implementations • 4 Dec 2020 • Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann

The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.

Action Detection Action Recognition

Paper
Add Code

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Zero-shot Entity Linking with Efficient Long Range Sequence Modeling

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zonghai Yao, Liangliang Cao, Huapu Pan

This paper considers the problem of zero-shot entity linking, in which a link in the test time may not present in training.

Entity Linking Position

Paper
Code

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Large Scale Speech Sentiment Corpus

no code implementations • LREC 2020 • Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan

We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium.

Sentiment Analysis

Paper
Add Code

Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions

1 code implementation • ECCV 2020 • Matheus Gadelha, Aruni RoyChowdhury, Gopal Sharma, Evangelos Kalogerakis, Liangliang Cao, Erik Learned-Miller, Rui Wang, Subhransu Maji

The problems of shape classification and part segmentation from 3D point clouds have garnered increasing attention in the last few years.

General Classification Representation Learning +1

Paper
Code

Progressive Learning Algorithm for Efficient Person Re-Identification

no code implementations • 16 Dec 2019 • Zhen Li, Hanyang Shao, Nian Xue, Liang Niu, Liangliang Cao

This paper studies the problem of Person Re-Identification (ReID)for large-scale applications.

Bayesian Optimization Person Re-Identification

Paper
Add Code

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

no code implementations • 21 Nov 2019 • Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task.

Sentiment Analysis

Paper
Add Code

Product Image Recognition with Guidance Learning and Noisy Supervision

no code implementations • 26 Jul 2019 • Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao

Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers.

Paper
Add Code

Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

no code implementations • 25 Jul 2019 • Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.

Lung Cancer Diagnosis Self-Supervised Learning +1

Paper
Add Code

3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

no code implementations • 8 Jun 2019 • Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.

Lung Cancer Diagnosis Specificity

Paper
Add Code

Automatic adaptation of object detectors to new domains using self-training

1 code implementation • CVPR 2019 • Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller

Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.

Knowledge Distillation Pedestrian Detection +1

Paper
Code

Learning Deterministic Policy with Target for Power Control in Wireless Networks

no code implementations • 21 Feb 2019 • Yujiao Lu, Hancheng Lu, Liangliang Cao, Feng Wu, Daren Zhu

DRL-DPT overcomes the main obstacles in applying reinforcement learning and deep learning in wireless networks, i. e. continuous state space, continuous action space and convergence.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Focal Visual-Text Attention for Memex Question Answering

1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander Hauptmann

In addition to a text answer, a few grounding photos are also given to justify the answer.

Ranked #1 on Memex Question Answering on MemexQA

Memex Question Answering Question Answering +1

Paper
Code

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

1 code implementation • 11 Aug 2018 • Wei Tan, Shiyu Chang, Liana Fong, Cheng Li, Zijun Wang, Liangliang Cao

Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient.

Collaborative Filtering Data Compression

171

Paper
Code

Focal Visual-Text Attention for Visual Question Answering

2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Ranked #1 on Memex Question Answering on MemexQA

Memex Question Answering Question Answering +1

Paper
Code

Improving Object Detection from Scratch via Gated Feature Reuse

2 code implementations • 4 Dec 2017 • Zhiqiang Shen, Honghui Shi, Jiahui Yu, Hai Phan, Rogerio Feris, Liangliang Cao, Ding Liu, Xinchao Wang, Thomas Huang, Marios Savvides

In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i. e., without pre-trained models).

Object object-detection +1

Paper
Code

Lip2AudSpec: Speech reconstruction from silent lip movements video

1 code implementation • 26 Oct 2017 • Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.

Lip Reading

Paper
Code

MemexQA: Visual Memex Question Answering

1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Paper
Code

Learning from Noisy Labels with Distillation

no code implementations • ICCV 2017 • Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li

The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain.

Paper
Add Code

Image Based Appraisal of Real Estate Properties

no code implementations • 28 Nov 2016 • Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo

Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.

Paper
Add Code

Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data

no code implementations • 10 Aug 2016 • Yuncheng Li, Liangliang Cao, Jiang Zhu, Jiebo Luo

The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data.

Paper
Add Code

Detecting Sarcasm in Multimodal Social Platforms

no code implementations • 8 Aug 2016 • Rossano Schifanella, Paloma de Juan, Joel Tetreault, Liangliang Cao

Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment.

Paper
Add Code

Video2GIF: Automatic Generation of Animated GIFs from Video

1 code implementation • CVPR 2016 • Michael Gygli, Yale Song, Liangliang Cao

We introduce the novel problem of automatically generating animated GIFs from video.

Paper
Code

GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring

1 code implementation • 12 Apr 2016 • Wenying Ma, Liangliang Cao, Lei Yu, Guoping Long, Yucheng Li

We also applied GPU-FV for realtime video monitoring tasks and found that GPU-FV outperforms a number of previous works.

Retrieval

Paper
Code

TGIF: A New Dataset and Benchmark on Animated GIF Description

1 code implementation • CVPR 2016 • Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo

The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.

Image Captioning Machine Translation +3

111

Paper
Code

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

3 code implementations • 11 Mar 2016 • Wei Tan, Liangliang Cao, Liana Fong

Matrix factorization (MF) is employed by many popular algorithms, e. g., collaborative filtering.

Distributed, Parallel, and Cluster Computing Performance

171

Paper
Code

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games

no code implementations • 22 Sep 2015 • Nikolai Yakovenko, Liangliang Cao, Colin Raffel, James Fan

The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a CNN based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.

Game of Poker

Paper
Add Code

Medical Synonym Extraction with Concept Space Models

no code implementations • 1 Jun 2015 • Chang Wang, Liangliang Cao, Bo-Wen Zhou

In this paper, we present a novel approach for medical synonym extraction.

Paper
Add Code

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

no code implementations • 1 Feb 2015 • Xiaodan Liang, Liang Lin, Liangliang Cao

Action recognition is an important problem in multimedia understanding.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Practice in Synonym Extraction at Large Scale

no code implementations • 6 Dec 2014 • Liangliang Cao, Chang Wang

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications.

Question Answering

Paper
Add Code

Learning Locally-Adaptive Decision Functions for Person Verification

no code implementations • CVPR 2013 • Zhen Li, Shiyu Chang, Feng Liang, Thomas S. Huang, Liangliang Cao, John R. Smith

This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule.

Face Verification Metric Learning +2

Paper
Add Code

Designing Category-Level Attributes for Discriminative Visual Recognition

no code implementations • CVPR 2013 • Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

In this paper, we propose a novel formulation to automatically design discriminative "category-level attributes", which can be efficiently encoded by a compact category-attribute matrix.

Attribute Transfer Learning +1

Paper
Add Code

Efficient Maximum Appearance Search for Large-Scale Object Detection

no code implementations • CVPR 2013 • Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets.

Object object-detection +1

Paper
Add Code

Learning to Search Efficiently in High Dimensions

no code implementations • NeurIPS 2011 • Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang

Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees).

Computational Efficiency Vocal Bursts Intensity Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.