Search Results for author: Anyi Rao

Found 21 papers, 11 papers with code

Adding Conditional Control to Text-to-Image Diffusion Models

4 code implementations ICCV 2023 Lvmin Zhang, Anyi Rao, Maneesh Agrawala

ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.

Image Generation

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

4 code implementations10 Jul 2023 Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai

Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.

Image Animation

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

1 code implementation28 Nov 2023 Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai

The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.

Video Generation

HotFlip: White-Box Adversarial Examples for Text Classification

2 code implementations ACL 2018 Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier.

General Classification text-classification +1

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

4 code implementations CVPR 2020 Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.

Action Recognition Scene Segmentation +1

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

1 code implementation CVPR 2022 Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi

To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.

3D Part Segmentation Domain Generalization +1

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

1 code implementation17 Feb 2023 Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang

Specifically, we construct a negative-sample-free triplet steam structure that is composed of an anchor stream without any masking, a spatial masking stream with Central Spatial Masking (CSM), and a temporal masking stream with Motion Attention Temporal Masking (MATM).

Action Recognition Contrastive Learning +4

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

4 code implementations12 Sep 2022 Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.

Contrastive Learning Cross-Modal Retrieval +4

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

1 code implementation27 May 2023 Dachuan Shi, Chaofan Tao, Anyi Rao, Zhendong Yang, Chun Yuan, Jiaqi Wang

Although extensively studied for unimodal models, the acceleration for multimodal models, especially the vision-language Transformers, is relatively under-explored.

Image Captioning Image Retrieval +5

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization

1 code implementation7 Aug 2023 Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang

Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed.

Action Recognition Mutual Information Estimation +1

Automatic Music Accompanist

1 code implementation24 Mar 2018 Anyi Rao, Francis Lau

The computer musician is able to produce musical accompaniment that relates musically to the human performance.

Sound Multimedia Audio and Speech Processing

MovieNet: A Holistic Dataset for Movie Understanding

no code implementations ECCV 2020 Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin

We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.

Video Understanding

Online Multi-modal Person Search in Videos

no code implementations ECCV 2020 Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.

Person Recognition Person Search

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering

no code implementations10 Dec 2021 Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin

The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

no code implementations17 Oct 2022 Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery.

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

no code implementations30 Jan 2023 Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai

Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots.

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

no code implementations5 Jun 2023 Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations.

Open-Ended Question Answering Stock Prediction

Automated Conversion of Music Videos into Lyric Videos

no code implementations28 Aug 2023 Jiaju Ma, Anyi Rao, Li-Yi Wei, Rubaiat Habib Kazi, Hijung Valentina Shin, Maneesh Agrawala

Musicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs.

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

no code implementations29 Nov 2023 Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired.

Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.