Search Results for author: Shufan Li

Found 9 papers, 7 papers with code

Aligning Diffusion Models by Optimizing Human Utility

no code implementations • 6 Apr 2024 • Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozuka

We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by formulating the alignment objective as the maximization of expected human utility.

Paper
Add Code

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation • 4 Mar 2024 • Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping.

Paper
Code

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

1 code implementation • 8 Feb 2024 • Shufan Li, Harkanwar Singh, Aditya Grover

A recent architecture, Mamba, based on state space models has been shown to achieve comparable performance for modeling text sequences, while scaling linearly with the sequence length.

Action Recognition Weather Forecasting

Paper
Code

InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following

1 code implementation • 11 Dec 2023 • Shufan Li, Harkanwar Singh, Aditya Grover

We demonstrate that our system can perform a series of novel instruction-guided editing tasks.

Decoder Instruction Following

Paper
Code

Hierarchical Open-vocabulary Universal Image Segmentation

1 code implementation • NeurIPS 2023 • Xudong Wang, Shufan Li, Konstantinos Kallidromitis, Yusuke Kato, Kazuki Kozuka, Trevor Darrell

Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.

Ranked #1 on Image Segmentation on Pascal Panoptic Parts

Image Comprehension Image Segmentation +8

243

Paper
Code

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

1 code implementation • ICCV 2023 • Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales.

Representation Learning

110

Paper
Code

Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images

no code implementations • 25 Nov 2022 • Shufan Li, Congxi Lu, Linkai Li, Haoshuai Zhou

We collected two datasets consisting of real camera photos for evaluation.

object-detection Object Detection +3

Paper
Add Code

Refine and Represent: Region-to-Object Representation Learning

1 code implementation • 25 Aug 2022 • Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.

Object Representation Learning +4

Paper
Code

Interpreting Audiograms with Multi-stage Neural Networks

1 code implementation • 17 Dec 2021 • Shufan Li, Congxi Lu, Linkai Li, Jirong Duan, Xinping Fu, Haoshuai Zhou

Audiograms are a particular type of line charts representing individuals' hearing level at various frequencies.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.