Search Results for author: Anyi Rao

Found 21 papers, 11 papers with code

Adding Conditional Control to Text-to-Image Diffusion Models

4 code implementations • ICCV 2023 • Lvmin Zhang, Anyi Rao, Maneesh Agrawala

ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.

Image Generation

34,515

Paper
Code

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

4 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai

Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.

Image Animation

8,678

Paper
Code

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

1 code implementation • 28 Nov 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai

The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.

Video Generation

8,678

Paper
Code

HotFlip: White-Box Adversarial Examples for Text Classification

2 code implementations • ACL 2018 • Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier.

General Classification text-classification +1

4,291

Paper
Code

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

4 code implementations • CVPR 2020 • Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.

Action Recognition Scene Segmentation +1

211

Paper
Code

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

1 code implementation • CVPR 2022 • Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi

To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.

3D Part Segmentation Domain Generalization +1

Paper
Code

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

1 code implementation • 17 Feb 2023 • Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang

Specifically, we construct a negative-sample-free triplet steam structure that is composed of an anchor stream without any masking, a spatial masking stream with Central Spatial Masking (CSM), and a temporal masking stream with Motion Attention Temporal Masking (MATM).

Action Recognition Contrastive Learning +4

Paper
Code

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

4 code implementations • 12 Sep 2022 • Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.

Ranked #7 on Molecule Captioning on ChEBI-20

Contrastive Learning Cross-Modal Retrieval +4

Paper
Code

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

1 code implementation • 27 May 2023 • Dachuan Shi, Chaofan Tao, Anyi Rao, Zhendong Yang, Chun Yuan, Jiaqi Wang

Although extensively studied for unimodal models, the acceleration for multimodal models, especially the vision-language Transformers, is relatively under-explored.

Image Captioning Image Retrieval +5

Paper
Code

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization

1 code implementation • 7 Aug 2023 • Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang

Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed.

Action Recognition Mutual Information Estimation +1

Paper
Code

Automatic Music Accompanist

1 code implementation • 24 Mar 2018 • Anyi Rao, Francis Lau

The computer musician is able to produce musical accompaniment that relates musically to the human performance.

Sound Multimedia Audio and Speech Processing

Paper
Code

MovieNet: A Holistic Dataset for Movie Understanding

no code implementations • ECCV 2020 • Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin

We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.

Video Understanding

Paper
Add Code

A Unified Framework for Shot Type Classification Based on Subject Centric Lens

no code implementations • ECCV 2020 • Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, Dahua Lin

Shots are key narrative elements of various videos, e. g. movies, TV series, and user-generated videos that are thriving over the Internet.

General Classification Vocal Bursts Type Prediction

Paper
Add Code

Online Multi-modal Person Search in Videos

no code implementations • ECCV 2020 • Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.

Person Recognition Person Search

Paper
Add Code

BlockPlanner: City Block Generation With Vectorized Graph Representation

no code implementations • ICCV 2021 • Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin

City modeling is the foundation for computational urban planning, navigation, and entertainment.

valid

Paper
Add Code

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering

no code implementations • 10 Dec 2021 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin

The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.

Paper
Add Code

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

no code implementations • 17 Oct 2022 • Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery.

Paper
Add Code

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

no code implementations • 30 Jan 2023 • Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai

Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots.

Paper
Add Code

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

no code implementations • 5 Jun 2023 • Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations.

Open-Ended Question Answering Stock Prediction

Paper
Add Code

Automated Conversion of Music Videos into Lyric Videos

no code implementations • 28 Aug 2023 • Jiaju Ma, Anyi Rao, Li-Yi Wei, Rubaiat Habib Kazi, Hijung Valentina Shin, Maneesh Agrawala

Musicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs.

Paper
Add Code

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

no code implementations • 29 Nov 2023 • Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired.

Pose Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.