1 code implementation • 27 Feb 2024 • Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao
In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field.
no code implementations • 13 Dec 2023 • Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang
In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
1 code implementation • 11 Oct 2023 • Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.
1 code implementation • 4 Oct 2023 • Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao
Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.
no code implementations • 18 Sep 2023 • Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang
Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 May 2023 • Liangchen Song, Liangliang Cao, Hongyu Xu, Kai Kang, Feng Tang, Junsong Yuan, Yang Zhao
The proposed framework consists of two significant components: Geometry Guided Diffusion and Mesh Optimization.
1 code implementation • 8 May 2023 • Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng
In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.
no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
no code implementations • 29 Nov 2022 • Taihong Xiao, ZiRui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.
1 code implementation • 27 Dec 2021 • Gopal Sharma, Bidya Dash, Aruni RoyChowdhury, Matheus Gadelha, Marios Loizou, Liangliang Cao, Rui Wang, Erik Learned-Miller, Subhransu Maji, Evangelos Kalogerakis
We present PriFit, a semi-supervised approach for label-efficient learning of 3D point cloud segmentation networks.
no code implementations • 8 Oct 2021 • Zhiyun Lu, Yanwei Pan, Thibault Doutre, Parisa Haghani, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman
Our experiments show that for both losses, the WER on long-form speech reduces substantially as the training utterance length increases.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland
As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 Apr 2021 • David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw
Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Apr 2021 • Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao
To improve streaming models, a recent study [1] proposed to distill a non-streaming teacher model on unsupervised utterances, and then train a streaming student using the teachers' predictions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Apr 2021 • Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao
To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland
End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Mar 2021 • David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Dec 2020 • Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann
The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.
1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao
We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zonghai Yao, Liangliang Cao, Huapu Pan
This paper considers the problem of zero-shot entity linking, in which a link in the test time may not present in training.
no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu
On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • LREC 2020 • Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan
We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium.
1 code implementation • ECCV 2020 • Matheus Gadelha, Aruni RoyChowdhury, Gopal Sharma, Evangelos Kalogerakis, Liangliang Cao, Erik Learned-Miller, Rui Wang, Subhransu Maji
The problems of shape classification and part segmentation from 3D point clouds have garnered increasing attention in the last few years.
no code implementations • 16 Dec 2019 • Zhen Li, Hanyang Shao, Nian Xue, Liang Niu, Liangliang Cao
This paper studies the problem of Person Re-Identification (ReID)for large-scale applications.
no code implementations • 21 Nov 2019 • Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan
In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task.
no code implementations • 26 Jul 2019 • Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao
Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers.
no code implementations • 25 Jul 2019 • Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian
Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.
no code implementations • 8 Jun 2019 • Jingya Liu, Liangliang Cao, Oguz Akin, YingLi Tian
Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans.
1 code implementation • CVPR 2019 • Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller
Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.
no code implementations • 21 Feb 2019 • Yujiao Lu, Hancheng Lu, Liangliang Cao, Feng Wu, Daren Zhu
DRL-DPT overcomes the main obstacles in applying reinforcement learning and deep learning in wireless networks, i. e. continuous state space, continuous action space and convergence.
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander Hauptmann
In addition to a text answer, a few grounding photos are also given to justify the answer.
Ranked #1 on Memex Question Answering on MemexQA
1 code implementation • 11 Aug 2018 • Wei Tan, Shiyu Chang, Liana Fong, Cheng Li, Zijun Wang, Liangliang Cao
Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient.
2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.
Ranked #1 on Memex Question Answering on MemexQA
2 code implementations • 4 Dec 2017 • Zhiqiang Shen, Honghui Shi, Jiahui Yu, Hai Phan, Rogerio Feris, Liangliang Cao, Ding Liu, Xinchao Wang, Thomas Huang, Marios Savvides
In this paper, we present a simple and parameter-efficient drop-in module for one-stage object detectors like SSD when learning from scratch (i. e., without pre-trained models).
1 code implementation • 26 Oct 2017 • Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.
1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.
no code implementations • ICCV 2017 • Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li
The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain.
no code implementations • 28 Nov 2016 • Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo
Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.
no code implementations • 10 Aug 2016 • Yuncheng Li, Liangliang Cao, Jiang Zhu, Jiebo Luo
The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data.
no code implementations • 8 Aug 2016 • Rossano Schifanella, Paloma de Juan, Joel Tetreault, Liangliang Cao
Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment.
1 code implementation • CVPR 2016 • Michael Gygli, Yale Song, Liangliang Cao
We introduce the novel problem of automatically generating animated GIFs from video.
1 code implementation • 12 Apr 2016 • Wenying Ma, Liangliang Cao, Lei Yu, Guoping Long, Yucheng Li
We also applied GPU-FV for realtime video monitoring tasks and found that GPU-FV outperforms a number of previous works.
1 code implementation • CVPR 2016 • Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.
3 code implementations • 11 Mar 2016 • Wei Tan, Liangliang Cao, Liana Fong
Matrix factorization (MF) is employed by many popular algorithms, e. g., collaborative filtering.
Distributed, Parallel, and Cluster Computing Performance
no code implementations • 22 Sep 2015 • Nikolai Yakovenko, Liangliang Cao, Colin Raffel, James Fan
The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a CNN based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.
no code implementations • 1 Jun 2015 • Chang Wang, Liangliang Cao, Bo-Wen Zhou
In this paper, we present a novel approach for medical synonym extraction.
no code implementations • 1 Feb 2015 • Xiaodan Liang, Liang Lin, Liangliang Cao
Action recognition is an important problem in multimedia understanding.
no code implementations • 6 Dec 2014 • Liangliang Cao, Chang Wang
Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications.
no code implementations • CVPR 2013 • Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
In this paper, we propose a novel formulation to automatically design discriminative "category-level attributes", which can be efficiently encoded by a compact category-attribute matrix.
no code implementations • CVPR 2013 • Zhen Li, Shiyu Chang, Feng Liang, Thomas S. Huang, Liangliang Cao, John R. Smith
This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule.
no code implementations • CVPR 2013 • Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan
In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets.
no code implementations • NeurIPS 2011 • Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees).