Search Results for author: Chong Luo

Found 40 papers, 15 papers with code

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

no code implementations • 30 May 2024 • Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing.

Paper
Add Code

Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

no code implementations • 29 Apr 2024 • Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong

In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios.

Meta-Learning

Paper
Add Code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations • 22 Apr 2024 • Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

Paper
Add Code

OmniVid: A Generative Framework for Universal Video Understanding

1 code implementation • 26 Mar 2024 • Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

Action Recognition Decoder +5

Paper
Code

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

no code implementations • 14 Mar 2024 • Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies.

Text-to-Image Generation

Paper
Add Code

Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs

1 code implementation • 12 Dec 2023 • Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jiaolong Yang, Seungryong Kim, Chong Luo

This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.

Novel View Synthesis Pose Estimation

Paper
Code

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

no code implementations • 28 Nov 2023 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

Paper
Add Code

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations • 28 Sep 2023 • Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

Paper
Add Code

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

no code implementations • 27 Apr 2023 • Junke Wang, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios.

Video Understanding

Paper
Add Code

LaMD: Latent Motion Diffusion for Video Generation

no code implementations • 23 Apr 2023 • Yaosi Hu, Zhenzhong Chen, Chong Luo

We present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea.

Video Generation Video Reconstruction

Paper
Add Code

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

no code implementations • 12 Apr 2023 • Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo

Filler words like ``um" or ``uh" are common in spontaneous speech.

Paper
Add Code

Streaming Video Model

1 code implementation • CVPR 2023 • Yucheng Zhao, Chong Luo, Chuanxin Tang, Dongdong Chen, Noel Codella, Zheng-Jun Zha

We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.

Action Recognition Decoder +2

Paper
Code

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

1 code implementation • 27 Mar 2023 • Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong

We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks.

Decoder Few-Shot Learning +1

246

Paper
Code

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

no code implementations • 21 Mar 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang

Object tracking (OT) aims to estimate the positions of target objects in a video sequence.

Object Object Tracking

Paper
Add Code

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Ranked #1 on Semi-Supervised Video Object Segmentation on Long Video Dataset (using extra training data)

Instance Segmentation Segmentation +3

Paper
Add Code

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

no code implementations • 24 Oct 2022 • Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.

Speech Enhancement

Paper
Add Code

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

Paper
Add Code

An Anchor-Free Detector for Continuous Speech Keyword Spotting

no code implementations • 9 Aug 2022 • Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.

Keyword Spotting object-detection +1

Paper
Add Code

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

no code implementations • 28 Jun 2022 • Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo

In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.

Sentence

Paper
Add Code

Peripheral Vision Transformer

1 code implementation • 14 Jun 2022 • Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho

We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.

Image Classification

Paper
Code

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

2 code implementations • ICLR 2022 • Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng

Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys.

Decoder Quantization +2

Paper
Code

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

2 code implementations • 26 Jan 2022 • Guangting Wang, Yucheng Zhao, Chuanxin Tang, Chong Luo, Wenjun Zeng

It can be even replaced by a zero-parameter operation.

Ranked #67 on Object Detection on COCO minival (APM metric)

Image Classification Object Detection +1

2,666

Paper
Code

Make It Move: Controllable Image-to-Video Generation with Text Descriptions

1 code implementation • CVPR 2022 • Yaosi Hu, Chong Luo, Zhenzhong Chen

With both controllable appearance and motion, TI2V aims at generating videos from a static image and a text description.

Image to Video Generation

Paper
Code

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

2 code implementations • 12 Sep 2021 • Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng

Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module.

Ranked #395 on Image Classification on ImageNet

Image Classification

192

Paper
Code

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

1 code implementation • 12 Sep 2021 • Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.

Decoder Voice Conversion

Paper
Code

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

1 code implementation • 30 Aug 2021 • Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.

192

Paper
Code

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

no code implementations • ICCV 2021 • Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.

Representation Learning Self-Supervised Learning

Paper
Add Code

Unsupervised Visual Representation Learning by Tracking Patches in Video

1 code implementation • CVPR 2021 • Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong

The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.

Action Classification Action Recognition +1

Paper
Code

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha

This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.

Classification Emotion Classification +6

Paper
Add Code

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

no code implementations • 28 Jan 2021 • Yizhou Zhou, Chong Luo, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng

We believe that VAE$^2$ is also applicable to other stochastic sequence prediction problems where training data are lack of stochasticity.

Video Prediction

Paper
Add Code

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

no code implementations • CVPR 2020 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Tracking by Instance Detection: A Meta-Learning Approach

no code implementations • CVPR 2020 • Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, Wen-Jun Zeng

We propose a principled three-step approach to build a high-performance tracker.

Domain Adaptation Meta-Learning +2

Paper
Add Code

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

4 code implementations • Applications of Artificial Intelligence Conference 2019 • Dacheng Yin, Chong Luo, Zhiwei Xiong, Wen-Jun Zeng

We discover that the two streams should communicate with each other, and this is crucial to phase prediction.

Sound Audio and Speech Processing

215

Paper
Code

Posterior-Guided Neural Architecture Search

1 code implementation • 23 Jun 2019 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient.

Image Classification Neural Architecture Search