Search Results for author: Yin Li

Found 79 papers, 32 papers with code

RegionCLIP: Region-based Language-Image Pretraining

1 code implementation CVPR 2022 Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.

Ranked #11 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Image Classification Object +3

ActionFormer: Localizing Moments of Actions with Transformers

1 code implementation16 Feb 2022 Chenlin Zhang, Jianxin Wu, Yin Li

Self-attention based Transformer models have demonstrated impressive results for image classification and object detection, and more recently for video understanding.

Action Recognition audio-visual event localization +3

SnAG: Scalable and Accurate Video Grounding

1 code implementation2 Apr 2024 Fangzhou Mu, Sicheng Mo, Yin Li

In this paper, we study the effect of cross-modal fusion on the scalability of video grounding models.

Video Grounding Video Understanding

Learning to Predict the Cosmological Structure Formation

1 code implementation15 Nov 2018 Siyu He, Yin Li, Yu Feng, Shirley Ho, Siamak Ravanbakhsh, Wei Chen, Barnabás Póczos

We build a deep neural network, the Deep Density Displacement Model (hereafter D$^3$M), to predict the non-linear structure formation of the Universe from simple linear perturbation theory.

Interpretable and Accurate Fine-grained Recognition via Region Grouping

1 code implementation CVPR 2020 Zixuan Huang, Yin Li

Our results compare favorably to state-of-the-art methods on classification tasks, and our method outperforms previous approaches on the localization of object parts.

Fine-Grained Visual Recognition General Classification +1

nbodykit: an open-source, massively parallel toolkit for large-scale structure

2 code implementations15 Dec 2017 Nick Hand, Yu Feng, Florian Beutler, Yin Li, Chirag Modi, Uros Seljak, Zachary Slepian

The package is extensively documented at http://nbodykit. readthedocs. io, which also includes an interactive set of example recipes for new users to explore.

Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics

Learning to Generate Scene Graph from Natural Language Supervision

1 code implementation ICCV 2021 Yiwu Zhong, Jing Shi, Jianwei Yang, Chenliang Xu, Yin Li

To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.

Graph Generation Scene Graph Generation +1

Comprehensive Image Captioning via Scene Graph Decomposition

1 code implementation ECCV 2020 Yiwu Zhong, Li-Wei Wang, Jianshu Chen, Dong Yu, Yin Li

We address the challenging problem of image captioning by revisiting the representation of image scene graph.

Image Captioning Sentence

The Quijote simulations

3 code implementations11 Sep 2019 Francisco Villaescusa-Navarro, ChangHoon Hahn, Elena Massara, Arka Banerjee, Ana Maria Delgado, Doogesh Kodi Ramanah, Tom Charnock, Elena Giusarma, Yin Li, Erwan Allys, Antoine Brochard, Chi-Ting Chiang, Siyu He, Alice Pisani, Andrej Obuljen, Yu Feng, Emanuele Castorina, Gabriella Contardo, Christina D. Kreisch, Andrina Nicola, Roman Scoccimarro, Licia Verde, Matteo Viel, Shirley Ho, Stephane Mallat, Benjamin Wandelt, David N. Spergel

The Quijote simulations are a set of 44, 100 full N-body simulations spanning more than 7, 000 cosmological models in the $\{\Omega_{\rm m}, \Omega_{\rm b}, h, n_s, \sigma_8, M_\nu, w \}$ hyperplane.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

1 code implementation CVPR 2023 Yiwu Zhong, Licheng Yu, Yang Bai, Shangwen Li, Xueting Yan, Yin Li

In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations.

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

1 code implementation11 Apr 2017 Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik

Image-language matching tasks have recently attracted a lot of attention in the computer vision field.

Image-text matching Retrieval +4

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

1 code implementation ICCV 2023 Matthew Dutson, Yin Li, Mohit Gupta

In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing.

Action Recognition Video Object Detection +1

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

1 code implementation ECCV 2020 Miao Liu, Siyu Tang, Yin Li, James Rehg

Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action.

Action Anticipation Human-Object Interaction Detection

The Secrets of Salient Object Segmentation

1 code implementation CVPR 2014 Yin Li, Xiaodi Hou, Christof Koch, James M. Rehg, Alan L. Yuille

The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing.

Object Segmentation +1

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

1 code implementation CVPR 2021 Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

Contrastive Learning Knowledge Distillation +6

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

1 code implementation21 Oct 2020 ran Xu, Chen-Lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi

In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios.

Object object-detection +3

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

1 code implementation16 Nov 2022 Sicheng Mo, Fangzhou Mu, Yin Li

This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge.

Natural Language Queries Temporal Action Localization +1

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

1 code implementation22 Feb 2024 Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, YIngyu Liang

An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples.

Disconnected Covariance of 2-point Functions in Large-Scale Structure

1 code implementation14 Nov 2018 Yin Li, Sukhdeep Singh, Byeonghee Yu, Yu Feng, Uros Seljak

We verify the analytic covariance against the sample covariance from the galaxy mock simulations in two test cases: (1) the power spectrum multipole covariance, and (2) the joint covariance of the projected correlation function and the correlation function multipoles.

Cosmology and Nongalactic Astrophysics

Rotation method for accelerating multiple-spherical Bessel function integrals against a numerical source function

1 code implementation29 Nov 2019 Zachary Slepian, Yin Li, Marcel Schmittfull, Zvonimir Vlah

In analysing these datasets recomputation of these integrals a substantial number of times, for instance to update perturbation theory predictions or covariance matrices as the input linear power spectrum is changed, will be one piece in a Monte Carlo Markov Chain cosmological parameter search: thus the overall savings from our method should be significant.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

Super-resolving Dark Matter Halos using Generative Deep Learning

1 code implementation11 Nov 2021 David Schaurecker, Yin Li, Jeremy Tinker, Shirley Ho, Alexandre Refregier

Generative deep learning methods built upon Convolutional Neural Networks (CNNs) provide a great tool for predicting non-linear structure in cosmology.

Simple lessons from complex learning: what a neural network model learns about cosmic structure formation

1 code implementation9 Jun 2022 Drew Jamieson, Yin Li, Siyu He, Francisco Villaescusa-Navarro, Shirley Ho, Renan Alves de Oliveira, David N. Spergel

We find our model generalizes well to these well understood scenarios, demonstrating that the networks have inferred general physical principles and learned the nonlinear mode couplings from the complex, random Gaussian training data.

CoLA

Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

1 code implementation21 Jan 2024 Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

Conclusion: We creatively apply sequential models in the long-term management of SCIT with promising accuracy in the prediction of SCIT nonadherence in Allergic Rhinitis (AR) patients.

Management

Learning to Grasp Without Seeing

no code implementations10 May 2018 Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

Object Localization

Deep Crisp Boundaries: From Boundaries to Higher-level Tasks

no code implementations8 Jan 2018 Yupei Wang, Xin Zhao, Yin Li, Kaiqi Huang

These ConvNet based edge detectors have approached human level performance on standard benchmarks.

Edge Detection Object Proposal Generation +2

Learning Deep Structure-Preserving Image-Text Embeddings

no code implementations CVPR 2016 Liwei Wang, Yin Li, Svetlana Lazebnik

This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities.

Image Retrieval Metric Learning +2

Unsupervised Learning of Edges

no code implementations CVPR 2016 Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár

In this work we present a simple yet effective approach for training edge detectors without human supervision.

Edge Detection Motion Estimation +2

Sense-Aware Neural Models for Pun Location in Texts

no code implementations ACL 2018 Yitao Cai, Yin Li, Xiaojun Wan

In this paper, we focus on the task of pun location, which aims to identify the pun word in a given short text.

Word Sense Disambiguation

Beyond Grids: Learning Graph Representations for Visual Recognition

no code implementations NeurIPS 2018 Yin Li, Abhinav Gupta

Our method further learns to propagate information across all vertices on the graph, and is able to project the learned graph representation back into 2D grids.

Instance Segmentation object-detection +3

3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare

no code implementations CVPR 2018 Abhijit Kundu, Yin Li, James M. Rehg

Our method produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving.

Ranked #3 on Vehicle Pose Estimation on KITTI Cars Hard (using extra training data)

3D Object Reconstruction Autonomous Driving +2

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

no code implementations ECCV 2018 Yin Li, Miao Liu, James M. Rehg

We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera.

Action Recognition Gaze Estimation +1

Attention Distillation for Learning Video Representations

no code implementations5 Apr 2019 Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg

To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition.

Action Recognition Video Recognition

Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework

no code implementations8 Aug 2019 Deepan Das, Noor Mohammed Ghouse, Shashank Verma, Yin Li

To accomplish this task, our architecture makes use of the rich semantic information available in a joint embedding space of multi-modal data.

Image Retrieval Retrieval

Gradients as Features for Deep Representation Learning

no code implementations ICLR 2020 Fangzhou Mu, YIngyu Liang, Yin Li

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.

Representation Learning

In the Eye of the Beholder: Gaze and Actions in First Person Video

no code implementations31 May 2020 Yin Li, Miao Liu, James M. Rehg

Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV.

Action Recognition Gaze Estimation

An optimal FFT-based anisotropic power spectrum estimator

no code implementations7 Apr 2017 Nick Hand, Yin Li, Zachary Slepian, Uros Seljak

Here, we present a faster, optimal means of using FFTs for this measurement.

Cosmology and Nongalactic Astrophysics

AI-assisted super-resolution cosmological simulations

no code implementations13 Oct 2020 Yin Li, Yueying Ni, Rupert A. C. Croft, Tiziana Di Matteo, Simeon Bird, Yu Feng

Cosmological simulations of galaxy formation are limited by finite computational resources.

Super-Resolution

Prism: Private Verifiable Set Computation over Multi-Owner Outsourced Databases

no code implementations7 Apr 2021 Yin Li, Dhrubajyoti Ghosh, Peeyush Gupta, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma

This paper proposes Prism, a secret sharing based approach to compute private set operations (i. e., intersection and union), as well as aggregates over outsourced databases belonging to multiple owners.

AI-assisted super-resolution cosmological simulations II: Halo substructures, velocities and higher order statistics

no code implementations3 May 2021 Yueying Ni, Yin Li, Patrick Lachance, Rupert A. C. Croft, Tiziana Di Matteo, Simeon Bird, Yu Feng

In this work, we expand and test the capabilities of our recently developed super-resolution (SR) model to generate high-resolution (HR) realizations of the full phase-space matter distribution, including both displacement and velocity, from computationally cheap low-resolution (LR) cosmological N-body simulations.

Super-Resolution

Learning the Evolution of the Universe in N-body Simulations

no code implementations10 Dec 2020 Chang Chen, Yin Li, Francisco Villaescusa-Navarro, Shirley Ho, Anthony Pullen

Understanding the physics of large cosmological surveys down to small (nonlinear) scales will significantly improve our knowledge of the Universe.

Egocentric Activity Recognition and Localization on a 3D Map

no code implementations20 May 2021 Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li

Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space?

Action Localization Action Recognition +2

Hyperspectral Remote Sensing Image Classification Based on Multi-scale Cross Graphic Convolution

no code implementations28 Jun 2021 Yunsong Zhao, Yin Li, Zhihan Chen, Tianchong Qiu, Guojin Liu

Using a multi-scale convolution algorithm, the input dimensionality reduction features were mined to obtain shallow features, which then served as inputs into a multi-scale graph convolution algorithm to construct the internal relationships between eigenvalues at different scales.

Classification Dimensionality Reduction +2

Weakly Supervised Foreground Learning for Weakly Supervised Localization and Detection

no code implementations3 Aug 2021 Chen-Lin Zhang, Yin Li, Jianxin Wu

Modern deep learning models require large amounts of accurately annotated data, which is often difficult to satisfy.

Weakly-Supervised Object Localization

Towards Non-Line-of-Sight Photography

no code implementations16 Sep 2021 Jiayong Peng, Fangzhou Mu, Ji Hyun Nam, Siddeshwar Raghavan, Yin Li, Andreas Velten, Zhiwei Xiong

Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects.

Multifield Cosmology with Artificial Intelligence

no code implementations20 Sep 2021 Francisco Villaescusa-Navarro, Daniel Anglés-Alcázar, Shy Genel, David N. Spergel, Yin Li, Benjamin Wandelt, Andrina Nicola, Leander Thiele, Sultan Hassan, Jose Manuel Zorrilla Matilla, Desika Narayanan, Romeel Dave, Mark Vogelsberger

Although our maps only cover a small area of $(25~h^{-1}{\rm Mpc})^2$, and the different fields are contaminated by astrophysical effects in very different ways, our networks can infer the values of $\Omega_{\rm m}$ and $\sigma_8$ with a few percent level precision for most of the fields.

Robust marginalization of baryonic effects for cosmological inference at the field level

no code implementations21 Sep 2021 Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar, David N. Spergel, Yin Li, Benjamin Wandelt, Leander Thiele, Andrina Nicola, Jose Manuel Zorrilla Matilla, Helen Shao, Sultan Hassan, Desika Narayanan, Romeel Dave, Mark Vogelsberger

We train neural networks to perform likelihood-free inference from $(25\, h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project.

A Simple Baseline for Weakly-Supervised Scene Graph Generation

no code implementations ICCV 2021 Jing Shi, Yiwu Zhong, Ning Xu, Yin Li, Chenliang Xu

We investigate the weakly-supervised scene graph generation, which is a challenging task since no correspondence of label and object is provided.

Contrastive Learning Graph Generation +2

Event Neural Networks

2 code implementations2 Dec 2021 Matthew Dutson, Yin Li, Mohit Gupta

Video data is often repetitive; for example, the contents of adjacent frames are usually strongly correlated.

2D Human Pose Estimation Image Enhancement +2

Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

no code implementations3 May 2022 Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, Yin Li

Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms.

SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles

no code implementations CVPR 2022 ran Xu, Fangzhou Mu, Jayoung Lee, Preeti Mukherjee, Somali Chaterji, Saurabh Bagchi, Yin Li

In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time?

object-detection Video Object Detection

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning

no code implementations13 Jun 2022 Zhuoran Yu, Yin Li, Yong Jae Lee

However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable.

Out-of-Distribution Detection

Reconstructing the Universe with Variational self-Boosted Sampling

no code implementations28 Jun 2022 Chirag Modi, Yin Li, David Blei

We show that after a short initial warm-up and training phase, VBS generates better quality of samples than simple VI approaches and reduces the correlation length in the sampling phase by a factor of 10-50 over using only HMC to explore the posterior of initial conditions in 64$^3$ and 128$^3$ dimensional problems, with larger gains for high signal-to-noise data observations.

Variational Inference

Robust Scene Inference under Noise-Blur Dual Corruptions

no code implementations24 Jul 2022 Bhavya Goyal, Jean-François Lalonde, Yin Li, Mohit Gupta

This creates a trade-off between these two kinds of image degradations: motion blur (due to long exposure) vs. noise (due to short exposure), also referred as a dual image corruption pair in this paper.

Image Classification object-detection +1

Particle clustering in turbulence: Prediction of spatial and statistical properties with deep learning

1 code implementation5 Oct 2022 Yan-Mong Chan, Natascha Manger, Yin Li, Chao-Chin Yang, Zhaohuan Zhu, Philip J. Armitage, Shirley Ho

The simulation data are used to train a U-Net deep learning model to predict gridded three-dimensional representations of the particle density and velocity fields, given as input the corresponding fluid fields.

Clustering

mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors

no code implementations15 Oct 2022 Sizhe An, Yin Li, Umit Ogras

To bridge the gap, we present mRI, a multi-modal 3D human pose estimation dataset with mmWave, RGB-D, and Inertial Sensors.

3D Human Pose Estimation Action Detection +1

3D Scene Inference from Transient Histograms

no code implementations9 Nov 2022 Sacha Jungerman, Atul Ingle, Yin Li, Mohit Gupta

Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices.

PICO

InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

no code implementations13 Mar 2023 Zhuoran Yu, Yin Li, Yong Jae Lee

Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i. e., close to the current training data.

Out-of-Distribution Detection

SimHaze: game engine simulated data for real-world dehazing

no code implementations25 May 2023 Zhengyang Lou, Huan Xu, Fangzhou Mu, Yanli Liu, XiaoYu Zhang, Liang Shang, Jiang Li, Bochen Guan, Yin Li, Yu Hen Hu

Using a modern game engine, our approach renders crisp clean images and their precise depth maps, based on which high-quality hazy images can be synthesized for training dehazing models.

Depth Estimation Image Dehazing +1

A Review of Adversarial Attacks in Computer Vision

no code implementations15 Aug 2023 Yutong Zhang, Yao Li, Yin Li, Zhichang Guo

Deep neural networks have been widely used in various downstream tasks, especially those safety-critical scenario such as autonomous driving, but deep networks are often threatened by adversarial samples.

Autonomous Driving

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

no code implementations12 Dec 2023 Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models.

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

no code implementations7 Feb 2024 Xin Zhao, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera.

Autonomous Driving Object Tracking +1

Towards 3D Vision with Low-Cost Single-Photon Cameras

no code implementations26 Mar 2024 Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li

We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras.

3D Object Reconstruction Neural Rendering

A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

no code implementations4 Apr 2024 Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC).

Management Tumor Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.