Search Results for author: Humphrey Shi

Found 81 papers, 58 papers with code

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

no code implementations23 Nov 2018 Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.

General Classification Image Classification +6

CCNet: Criss-Cross Attention for Semantic Segmentation

4 code implementations ICCV 2019 Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang

Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.

Ranked #7 on Semantic Segmentation on FoodSeg103 (using extra training data)

Computational Efficiency Human Parsing +8

AlignSeg: Feature-Aligned Segmentation Networks

1 code implementation24 Feb 2020 Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation.

Segmentation Semantic Segmentation

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

1 code implementation CVPR 2020 Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerio Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work.

Semantic Segmentation Unsupervised Domain Adaptation

Pyramid Attention Networks for Image Restoration

2 code implementations28 Apr 2020 Yiqun Mei, Yuchen Fan, Yulun Zhang, Jiahui Yu, Yuqian Zhou, Ding Liu, Yun Fu, Thomas S. Huang, Humphrey Shi

Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales.

Demosaicking Image Denoising +1

Deep Learning-Based Automated Image Segmentation for Concrete Petrographic Analysis

no code implementations21 May 2020 Yu Song, Zilong Huang, Chuanyue Shen, Humphrey Shi, David A Lange

The standard petrography test method for measuring air voids in concrete (ASTM C457) requires a meticulous and long examination of sample phase composition under a stereomicroscope.

Image Segmentation Segmentation +1

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

3 code implementations CVPR 2020 Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi

By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image.

Feature Correlation Image Super-Resolution

Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

no code implementations28 Jun 2020 Hanchao Yu, Xiao Chen, Humphrey Shi, Terrence Chen, Thomas S. Huang, Shanhui Sun

In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation.

Knowledge Distillation Motion Estimation

The 1st Tiny Object Detection Challenge:Methods and Results

1 code implementation16 Sep 2020 Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi

The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.

Human Detection Object +2

Deep Learning for 3D Point Cloud Understanding: A Survey

1 code implementation18 Sep 2020 Haoming Lu, Humphrey Shi

The development of practical applications, such as autonomous driving and robotics, has brought increasing attention to 3D point cloud understanding.

Autonomous Driving

Human-Object Interaction Detection:A Quick Survey and Examination of Methods

1 code implementation27 Sep 2020 Trevor Bergstrom, Humphrey Shi

In order to provide insight to future researchers, we perform an individualized study that examines the performance of each component of a multi-stream convolutional neural network architecture for human-object interaction detection.

Human-Object Interaction Detection Object

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

1 code implementation CVPR 2021 Xingqian Xu, Zhifei Zhang, Zhaowen Wang, Brian Price, Zhonghao Wang, Humphrey Shi

We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e. g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models.

Segmentation Style Transfer +2

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

1 code implementation ICCV 2021 Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi, Yi Yang

Most existing few-shot classification methods only consider generalization on one dataset (i. e., single-domain), failing to transfer across various seen and unseen domains.

Classification Domain Generalization

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

1 code implementation CVPR 2021 Abulikemu Abuduweili, Xingjian Li, Humphrey Shi, Cheng-Zhong Xu, Dejing Dou

To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples.

Pseudo Label Transfer Learning

Study Group Learning: Improving Retinal Vessel Segmentation Trained with Noisy Labels

1 code implementation5 Mar 2021 Yuqian Zhou, Hanchao Yu, Humphrey Shi

Retinal vessel segmentation from retinal images is an essential task for developing the computer-aided diagnosis system for retinal diseases.

Retinal Vessel Segmentation Segmentation

UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution

1 code implementation23 Mar 2021 Xingqian Xu, Zhangyang Wang, Humphrey Shi

In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions in which we deeply integrated spatial coordinates and periodic encoding with the implicit neural representation.

Super-Resolution

Learning to Track Instances without Video Annotations

no code implementations CVPR 2021 Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Humphrey Shi, Jan Kautz

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches.

Instance Segmentation Pose Estimation +1

Escaping the Big Data Paradigm with Compact Transformers

8 code implementations12 Apr 2021 Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, Humphrey Shi

Our models are flexible in terms of model size, and can have as little as 0. 28M parameters while achieving competitive results.

 Ranked #1 on Image Classification on Flowers-102 (using extra training data)

Fine-Grained Image Classification Superpixel Image Classification

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

1 code implementation29 Apr 2021 Jiachen Li, Bowen Cheng, Rogerio Feris, JinJun Xiong, Thomas S. Huang, Wen-mei Hwu, Humphrey Shi

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric.

Object object-detection +1

Is In-Domain Data Really Needed? A Pilot Study on Cross-Domain Calibration for Network Quantization

no code implementations16 May 2021 Haichao Yu, Linjie Yang, Humphrey Shi

Post-training quantization methods use a set of calibration data to compute quantization ranges for network parameters and activations.

Quantization

RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection

no code implementations26 May 2021 Jiachen Li, Yuan Lin, Rongrong Liu, Chiu Man Ho, Humphrey Shi

Segmentation-based scene text detection methods have been widely adopted for arbitrary-shaped text detection recently, since they make accurate pixel-level predictions on curved text instances and can facilitate real-time inference without time-consuming processing on anchors.

Scene Text Detection Segmentation +1

MSN: Efficient Online Mask Selection Network for Video Instance Segmentation

1 code implementation19 Jun 2021 Vidit Goel, Jiachen Li, Shubhika Garg, Harsh Maheshwari, Humphrey Shi

Our method improves the masks from segmentation and propagation branches in an online manner using the Mask Selection Network (MSN) hence limiting the noise accumulation during mask tracking.

Instance Segmentation Segmentation +4

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

1 code implementation26 Aug 2021 Wuyang Chen, Xinyu Gong, Junru Wu, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.

Neural Architecture Search

ConvMLP: Hierarchical Convolutional MLPs for Vision

4 code implementations9 Sep 2021 Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi

MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods.

Ranked #8 on Image Classification on Flowers-102 (using extra training data)

Image Classification Instance Segmentation +3

Feudal Reinforcement Learning by Reading Manuals

no code implementations13 Oct 2021 Kai Wang, Zhonghao Wang, Mo Yu, Humphrey Shi

The manager agent is a multi-hop plan generator dealing with high-level abstract information and generating a series of sub-goals in a backward manner.

reinforcement-learning Reinforcement Learning (RL)

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

no code implementations10 Dec 2021 Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.

Continuous Control Denoising +1

SeMask: Semantically Masked Transformers for Semantic Segmentation

1 code implementation arXiv 2021 Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation.

Semantic Segmentation

Object Localization under Single Coarse Point Supervision

2 code implementations CVPR 2022 Xuehui Yu, Pengfei Chen, Di wu, Najmul Hassan, Guorong Li, Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han

In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points.

Multiple Instance Learning Object +1

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image

1 code implementation2 Apr 2022 Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

Despite the rapid development of Neural Radiance Field (NeRF), the necessity of dense covers largely prohibits its wider applications.

Novel View Synthesis

Neighborhood Attention Transformer

5 code implementations CVPR 2023 Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi

We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision.

Image Classification Object Detection +1

Grasping the Arrow of Time from the Singularity: Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN

1 code implementation27 Apr 2022 Qiucheng Wu, Yifan Jiang, Junru Wu, Kai Wang, Gong Zhang, Humphrey Shi, Zhangyang Wang, Shiyu Chang

To study the motion features in the latent space of StyleGAN, in this paper, we hypothesize and demonstrate that a series of meaningful, natural, and versatile small, local movements (referred to as "micromotion", such as expression, head movement, and aging effect) can be represented in low-rank spaces extracted from the latent space of a conventionally pre-trained StyleGAN-v2 model for face generation, with the guidance of proper "anchors" in the form of either short text or video clips.

Disentanglement Face Generation

DiSparse: Disentangled Sparsification for Multitask Model Compression

1 code implementation CVPR 2022 Xinglong Sun, Ali Hassani, Zhangyang Wang, Gao Huang, Humphrey Shi

We analyzed the pruning masks generated with DiSparse and observed strikingly similar sparse network architecture identified by each task even before the training starts.

Model Compression

Towards Layer-wise Image Vectorization

1 code implementation CVPR 2022 Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, Humphrey Shi

Image rasterization is a mature technique in computer graphics, while image vectorization, the reverse path of rasterization, remains a major challenge.

Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand

1 code implementation5 Aug 2022 Jitesh Jain, Yuqian Zhou, Ning Yu, Humphrey Shi

We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures.

Image Inpainting Texture Synthesis

VMFormer: End-to-End Video Matting with Transformer

1 code implementation26 Aug 2022 Jiachen Li, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Yunchao Wei, Humphrey Shi

In this paper, we propose VMFormer: a transformer-based end-to-end method for video matting.

Video Matting

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

no code implementations27 Sep 2022 Yulin Wang, Yang Yue, Xinhong Xu, Ali Hassani, Victor Kulikov, Nikita Orlov, Shiji Song, Humphrey Shi, Gao Huang

Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e. g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame.

Video Recognition

Dilated Neighborhood Attention Transformer

5 code implementations29 Sep 2022 Ali Hassani, Humphrey Shi

These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention.

Image Classification Instance Segmentation +3

Image Completion with Heterogeneously Filtered Spectral Hints

1 code implementation7 Nov 2022 Xingqian Xu, Shant Navasardyan, Vahram Tadevosyan, Andranik Sargsyan, Yadong Mu, Humphrey Shi

We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i. e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved.

Image Inpainting

StyleNAT: Giving Each Head a New Perspective

2 code implementations10 Nov 2022 Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi

Image generation has been a long sought-after but challenging task, and performing the generation task in an efficient manner is similarly difficult.

Face Generation

OneFormer: One Transformer to Rule Universal Image Segmentation

2 code implementations CVPR 2023 Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance.

Instance Segmentation Panoptic Segmentation +3

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

3 code implementations ICCV 2023 Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi

In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model.

Disentanglement Image Captioning +5

Boosted Dynamic Neural Networks

1 code implementation30 Nov 2022 Haichao Yu, Haoxiang Li, Gang Hua, Gao Huang, Humphrey Shi

To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.

Mask Matching Transformer for Few-Shot Segmentation

1 code implementation5 Dec 2022 Siyu Jiao, Gengwei Zhang, Shant Navasardyan, Ling Chen, Yao Zhao, Yunchao Wei, Humphrey Shi

Typical methods follow the paradigm to firstly learn prototypical features from support images and then match query features in pixel-level to obtain segmentation results.

Few-Shot Semantic Segmentation Segmentation

Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style

no code implementations CVPR 2023 Haoming Lu, Hazarapet Tunanyan, Kai Wang, Shant Navasardyan, Zhangyang Wang, Humphrey Shi

Diffusion models have demonstrated impressive capability of text-conditioned image synthesis, and broader application horizons are emerging by personalizing those pretrained diffusion models toward generating some specialized target object or style.

Disentanglement Image Generation

MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices

1 code implementation ICCV 2023 Andranik Sargsyan, Shant Navasardyan, Xingqian Xu, Humphrey Shi

In this paper we present a simple image inpainting baseline, Mobile Inpainting GAN (MI-GAN), which is approximately one order of magnitude computationally cheaper and smaller than existing state-of-the-art inpainting models, and can be efficiently deployed on mobile devices.

Efficient Neural Network Image Inpainting +1

Graph Transformer GANs for Graph-Constrained House Generation

no code implementations CVPR 2023 Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc van Gool

We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.

Generative Adversarial Network House Generation +1

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

1 code implementation30 Mar 2023 Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi

The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry.

Disentanglement Memorization +1

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

1 code implementation30 Mar 2023 Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

We propose \textbf{PAIR} Diffusion, a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in the image.

Object

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

1 code implementation CVPR 2023 Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang

Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain.

Image Generation

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

1 code implementation25 May 2023 Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi

Text-to-image (T2I) research has grown explosively in the past year, owing to the large-scale pre-trained diffusion models and many emerging personalization and editing approaches.

Conditional Text-to-Image Synthesis Image Generation +3

Matting Anything

1 code implementation8 Jun 2023 Jiachen Li, Jitesh Jain, Humphrey Shi

In this paper, we propose the Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.

Image Matting Referring Image Matting

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap

no code implementations20 Jul 2023 Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang

We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks.

Image Inpainting

Interactive Neural Painting

no code implementations31 Jul 2023 Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP.

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

1 code implementation NeurIPS 2023 Siyu Jiao, Yunchao Wei, YaoWei Wang, Yao Zhao, Humphrey Shi

However, in the paper, we reveal that CLIP is insensitive to different mask proposals and tends to produce similar predictions for various mask proposals of the same image.

Open Vocabulary Semantic Segmentation Zero Shot Segmentation

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else

no code implementations11 Oct 2023 Hazarapet Tunanyan, Dejia Xu, Shant Navasardyan, Zhangyang Wang, Humphrey Shi

To achieve this goal, we identify the limitations in the text embeddings used for the pre-trained text-to-image diffusion models.

Image Manipulation Text-to-Image Generation

Video Instance Matting

1 code implementation7 Nov 2023 Jiachen Li, Roberto Henschel, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Humphrey Shi

To remedy this deficiency, we propose Video Instance Matting~(VIM), that is, estimating alpha mattes of each instance at each frame of a video sequence.

Binarization Image Matting +4

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

no code implementations30 Nov 2023 Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou

We further extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.

Denoising Image Generation

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

1 code implementation7 Dec 2023 Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi

Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step.

Diffusion for Natural Image Matting

1 code implementation10 Dec 2023 Yihan Hu, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei, Humphrey Shi

However, the presence of high computational overhead and the inconsistency of noise sampling between the training and inference processes pose significant obstacles to achieving this goal.

Image Matting

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

1 code implementation21 Dec 2023 Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results.

Image Inpainting Super-Resolution

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

1 code implementation21 Dec 2023 Jitesh Jain, Jianwei Yang, Humphrey Shi

Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task.

Image Captioning Image Generation +4

VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

no code implementations4 Jan 2024 Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models.

Video Editing

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

1 code implementation15 Feb 2024 Arman Isajanyan, Artur Shatveryan, David Kocharyan, Zhangyang Wang, Humphrey Shi

These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art.

Image Generation

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

1 code implementation7 Mar 2024 Ali Hassani, Wen-mei Hwu, Humphrey Shi

We observe that our fused kernels successfully circumvent some of the unavoidable inefficiencies in unfused implementations.

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

1 code implementation21 Mar 2024 Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

Benchmarking Object Detectors with COCO: A New Path Forward

no code implementations27 Mar 2024 Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai

With these findings, we advocate using COCO-ReM for future object detection research.

Cannot find the paper you are looking for? You can Submit a new open access paper.