Search Results for author: Shu Kong

Found 37 papers, 21 papers with code

Roadside Monocular 3D Detection via 2D Detection Prompting

no code implementations1 Apr 2024 Yechi Ma, Shuoquan Wei, Churun Zhang, Wei Hua, Yanan Li, Shu Kong

Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w. r. t detections on the 2D image plane.

Boosting Image Restoration via Priors from Pre-trained Models

no code implementations11 Mar 2024 Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao

Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions.

Deblurring Denoising +2

AccessLens: Auto-detecting Inaccessibility of Everyday Objects

no code implementations29 Jan 2024 Nahyun Kwon, Qian Lu, Muhammad Hasham Qazi, Joanne Liu, Changhoon Oh, Shu Kong, Jeeeun Kim

In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts.

The Neglected Tails of Vision-Language Models

no code implementations23 Jan 2024 Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, Shu Kong

We address this by using large language models (LLMs) to count the number of pretraining texts that contain synonyms of these concepts.

Retrieval Zero-Shot Learning

Revisiting Few-Shot Object Detection with Vision-Language Models

1 code implementation22 Dec 2023 Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan

Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively.

Autonomous Vehicles Few-Shot Object Detection +3

Long-Tailed 3D Detection via 2D Late Fusion

no code implementations18 Dec 2023 Yechi Ma, Neehar Peri, Shuoquan Wei, Wei Hua, Deva Ramanan, Yanan Li, Shu Kong

Autonomous vehicles (AVs) must accurately detect objects from both common and rare classes for safe navigation, motivating the problem of Long-Tailed 3D Object Detection (LT3D).

3D Object Detection Autonomous Vehicles +2

Instance Tracking in 3D Scenes from Egocentric Videos

1 code implementation7 Dec 2023 Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

Human-Object Interaction Detection Object Tracking

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

1 code implementation6 Dec 2023 Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.

3D Generation

A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture

1 code implementation30 Oct 2023 Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong

Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene.

8k Object +2

Prompting Scientific Names for Zero-Shot Species Recognition

no code implementations15 Oct 2023 Shubham Parashar, Zhiqiu Lin, Yanan Li, Shu Kong

We find that common names are more likely to be included in CLIP's training set, and prompting them achieves 2$\sim$5 times higher accuracy on benchmarking datasets of fine-grained species recognition.

Benchmarking Zero-Shot Learning

OV-PARTS: Towards Open-Vocabulary Part Segmentation

1 code implementation NeurIPS 2023 Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.

Open Vocabulary Semantic Segmentation Segmentation +1

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

1 code implementation26 May 2023 Yuzhu Wang, Lechao Cheng, Manni Duan, Yongheng Wang, Zunlei Feng, Shu Kong

Finally, we propose a rather simple loss term (dubbed ND loss) to simultaneously (1) encourage student to produce large-\emph{norm} features, and (2) align the \emph{direction} of student features and teacher class-means.

Domain Adaptation Knowledge Distillation

Far3Det: Towards Far-Field 3D Detection

no code implementations25 Nov 2022 Shubham Gupta, Jeet Kanjani, Mengtian Li, Francesco Ferroni, James Hays, Deva Ramanan, Shu Kong

We focus on the task of far-field 3D detection (Far3Det) of objects beyond a certain distance from an observer, e. g., $>$50m.

Autonomous Vehicles Philosophy

Towards Long-Tailed 3D Detection

1 code implementation16 Nov 2022 Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong

Moreover, semantic classes are often organized within a hierarchy, e. g., tail classes such as child and construction-worker are arguably subclasses of pedestrian.

Continual Learning with Evolving Class Ontologies

no code implementations10 Oct 2022 Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).

Class Incremental Learning Image Classification +3

Creating a Forensic Database of Shoeprints from Online Shoe Tread Photos

1 code implementation4 May 2022 Samia Shafique, Bailey Kong, Shu Kong, Charless C. Fowlkes

We develop a method termed ShoeRinsics that learns to predict depth by leveraging a mix of fully supervised synthetic data and unsupervised retail image data.

Benchmarking Depth Estimation +3

Long-Tailed Recognition via Weight Balancing

1 code implementation CVPR 2022 Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong

In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.

Classification Long-tail Learning

Multimodal Object Detection via Probabilistic Ensembling

2 code implementations7 Apr 2021 Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, Shu Kong

Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs).

Autonomous Vehicles Object +2

OpenGAN: Open-Set Recognition via Open Data Generation

1 code implementation ICCV 2021 Shu Kong, Deva Ramanan

However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which are unlikely to exhaustively span the open-world.

Open Set Learning

An Empirical Exploration of Open-Set Recognition via Lightweight Statistical Pipelines

no code implementations1 Jan 2021 Shu Kong, Deva Ramanan

Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world.

open-set classification Open Set Learning +3

Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias

1 code implementation CVPR 2021 Yunhan Zhao, Shu Kong, Charless Fowlkes

We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses.

Data Augmentation Depth Estimation +1

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations21 Jun 2020 Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

Celeganser: Automated Analysis of Nematode Morphology and Age

1 code implementation11 May 2020 Linfeng Wang, Shu Kong, Zachary Pincus, Charless Fowlkes

The nematode Caenorhabditis elegans (C. elegans) serves as an important model organism in a wide variety of biological studies.

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

no code implementations CVPR 2020 Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes

In this setting, we find that existing domain translation approaches are difficult to train and offer little advantage over simple baselines that use a mix of real and synthetic data.

Depth Prediction Monocular Depth Estimation +2

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation CVPR 2019 Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Attribute counterfactual +4

Image Reconstruction with Predictive Filter Flow

2 code implementations28 Nov 2018 Shu Kong, Charless Fowlkes

We propose a simple, interpretable framework for solving a wide range of image reconstruction problems such as denoising and deconvolution.

Deblurring Denoising +3

Pixel-wise Attentional Gating for Parsimonious Pixel Labeling

1 code implementation3 May 2018 Shu Kong, Charless Fowlkes

To achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget, we propose a \emph{Pixel-wise Attentional Gating} unit (\emph{PAG}) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network.

Boundary Detection Semantic Segmentation +1

Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model

no code implementations2 May 2018 Feng Zhou, Shu Kong, Charless Fowlkes, Tao Chen, Baiying Lei

Specifically, we first mapped facial expressions into dimensional measures so that we transformed facial expression analysis from a classification problem to a regression one.

General Classification regression

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations1 May 2018 Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Recurrent Pixel Embedding for Instance Grouping

2 code implementations CVPR 2018 Shu Kong, Charless Fowlkes

We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components.

Boundary Detection Clustering +4

Recurrent Scene Parsing with Perspective Understanding in the Loop

1 code implementation CVPR 2018 Shu Kong, Charless Fowlkes

We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby.

Ranked #32 on Semantic Segmentation on SUN-RGBD (using extra training data)

Monocular Depth Estimation Scene Parsing +2

Low-rank Bilinear Pooling for Fine-Grained Classification

no code implementations CVPR 2017 Shu Kong, Charless Fowlkes

To address the computational demands of high feature dimensionality, we propose to represent the covariance features as a matrix and apply a low-rank bilinear classifier.

Classification General Classification

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

2 code implementations6 Jun 2016 Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes

In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.

Aesthetics Quality Assessment

Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification

no code implementations3 May 2016 Shu Kong, Surangi Punyasena, Charless Fowlkes

We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology.

Dictionary Learning General Classification

Collaborative Receptive Field Learning

1 code implementation2 Feb 2014 Shu Kong, Zhuolin Jiang, Qiang Yang

However, measuring pairwise distance of RF's for building the similarity graph is a nontrivial problem.

General Classification Object Categorization

Learning Mid-Level Features and Modeling Neuron Selectivity for Image Classification

no code implementations22 Jan 2014 Shu Kong, Zhuolin Jiang, Qiang Yang

We now know that mid-level features can greatly enhance the performance of image learning, but how to automatically learn the image features efficiently and in an unsupervised manner is still an open question.

Age Estimation Clustering +7

Cannot find the paper you are looking for? You can Submit a new open access paper.