Search Results for author: Zijian He

Found 18 papers, 8 papers with code

AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment

no code implementations • 7 Apr 2024 • Yuanfeng Xu, Yuhao Chen, Zhongzhan Huang, Zijian He, Guangrun Wang, Philip Torr, Liang Lin

In this paper, we present AnimateZoo, a zero-shot diffusion-based video generator to address this challenging cross-species animation issue, aiming to accurately produce animal animations while preserving the background.

Video Editing Video Generation

Paper
Add Code

APIServe: Efficient API Support for Large-Language Model Inferencing

no code implementations • 2 Feb 2024 • Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang

Large language models are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks.

Language Modelling Large Language Model

Paper
Add Code

ControlRoom3D: Room Generation using Semantic Proxy Rooms

no code implementations • 8 Dec 2023 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou

Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.

Paper
Add Code

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

no code implementations • 6 Dec 2023 • Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

However, one of the major drawbacks of diffusion models is that the image generation process is costly.

Denoising Image Generation

Paper
Add Code

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.

Image Generation

Paper
Add Code

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

2 code implementations • ICCV 2023 • Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, RuiLong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.

3D Object Detection Depth Estimation +1

4,778

Paper
Code

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

no code implementations • CVPR 2023 • Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner

Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images.

Contrastive Learning Instance Segmentation +6

Paper
Add Code

A Practical Stereo Depth System for Smart Glasses

no code implementations • CVPR 2023 • Jialiang Wang, Daniel Scharstein, Akash Bapat, Kevin Blackburn-Matzen, Matthew Yu, Jonathan Lehman, Suhib Alsisan, Yanghan Wang, Sam Tsai, Jan-Michael Frahm, Zijian He, Peter Vajda, Michael F. Cohen, Matt Uyttendaele

We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable.

Monocular Depth Estimation Stereo Depth Estimation

Paper
Add Code

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

no code implementations • 7 Oct 2022 • Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira

Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters.

Paper
Add Code

Open-Set Semi-Supervised Object Detection

no code implementations • 29 Aug 2022 • Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

Paper
Add Code

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

170

Paper
Code

Adaptive Unbiased Teacher for Cross-Domain Object Detection

no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda

This enables the student model to capture domain-invariant features.

Data Augmentation Domain Adaptation +3

Paper
Add Code

Unbiased Teacher for Semi-Supervised Object Detection

4 code implementations • ICLR 2021 • Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Ranked #2 on Semi-Supervised Person Bounding Box Detection on COCO 1% labeled data

Image Classification Object +4

410

Paper
Code

One Shot 3D Photography

1 code implementation • 27 Aug 2020 • Johannes Kopf, Kevin Matzen, Suhib Alsisan, Ocean Quigley, Francis Ge, Yangming Chong, Josh Patterson, Jan-Michael Frahm, Shu Wu, Matthew Yu, Peizhao Zhang, Zijian He, Peter Vajda, Ayush Saraf, Michael Cohen

3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual Reality devices, where viewing it also includes stereo.

Monocular Depth Estimation

467

Paper
Code

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining

2 code implementations • CVPR 2021 • Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Bichen Wu, Zijian He, Zhen Wei, Kan Chen, Yuandong Tian, Matthew Yu, Peter Vajda, Joseph E. Gonzalez

To address this, we present Neural Architecture-Recipe Search (NARS) to search both (a) architectures and (b) their corresponding training recipes, simultaneously.

Ranked #5 on Neural Architecture Search on ImageNet