Search Results for author: Shen Yan

Found 32 papers, 14 papers with code

Streaming Dense Video Captioning

1 code implementation CVPR 2024 Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid

An ideal model for dense video captioning -- predicting captions localized temporally in a video -- should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to produce outputs before processing the entire video.

Dense Video Captioning

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

no code implementations16 Feb 2024 Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang

Our research undertakes a thorough exploration of the state-of-the-art perceiver resampler architecture and builds a strong baseline.

Language Modelling Question Answering +1

UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization

1 code implementation11 Jan 2024 Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Maojun Zhang, Shen Yan

Despite significant progress in global localization of Unmanned Aerial Vehicles (UAVs) in GPS-denied environments, existing methods remain constrained by the availability of datasets.

Synthetic Data Generation Visual Localization

Pixel-Aligned Language Model

no code implementations CVPR 2024 Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

When taking locations as inputs the model performs location-conditioned captioning which generates captions for the indicated object or region.

Language Modelling

Pixel Aligned Language Models

no code implementations14 Dec 2023 Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.

Language Modelling

Efficient Large Language Models: A Survey

3 code implementations6 Dec 2023 Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang

We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.

Natural Language Understanding Text Generation

UnLoc: A Unified Framework for Video Localization Tasks

1 code implementation ICCV 2023 Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

Action Segmentation Moment Retrieval +5

Long-term Visual Localization with Mobile Sensors

no code implementations CVPR 2023 Shen Yan, Yu Liu, Long Wang, Zehong Shen, Zhen Peng, Haomin Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou

Despite the remarkable advances in image matching and pose estimation, image-based localization of a camera in a temporally-varying outdoor environment is still a challenging problem due to huge appearance disparity between query and reference images caused by illumination, seasonal and structural changes.

Image-Based Localization Pose Estimation +1

Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks

1 code implementation ICCV 2023 Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo

In particular, our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.

Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior

no code implementations13 Feb 2023 Shen Yan, Xiaoya Cheng, Yuxiang Liu, Juelin Zhu, Rouwan Wu, Yu Liu, Maojun Zhang

Despite the significant progress in 6-DoF visual localization, researchers are mostly driven by ground-level benchmarks.

Pose Estimation Visual Localization

Deep Active Contours for Real-time 6-DoF Object Tracking

no code implementations ICCV 2023 Long Wang, Shen Yan, Jianan Zhen, Yu Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou

Specifically, given an initial pose, we project the object model to the image plane to obtain the initial contour and use a lightweight network to predict how the contour should move to match the true object boundary, which provides the gradients to optimize the object pose.

Computational Efficiency Object +1

Soft Augmentation for Image Classification

1 code implementation CVPR 2023 Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e. g., more aggressive image crop augmentations produce less confident learning targets.

Classification Data Augmentation +1

Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation

1 code implementation CVPR 2022 Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo

In this paper, we propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance that is competitive to ANNs yet with low latency.

Deep AutoAugment

1 code implementation11 Mar 2022 Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang

In this work, instead of fixing a set of hand-picked default augmentations alongside the searched data augmentations, we propose a fully automated approach for data augmentation search named Deep AutoAugment (DeepAA).

AutoML Data Augmentation +1

Multiview Transformers for Video Recognition

1 code implementation CVPR 2022 Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.

Ranked #5 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

NAS-Bench-x11 and the Power of Learning Curves

1 code implementation NeurIPS 2021 Shen Yan, Colin White, Yash Savani, Frank Hutter

While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research.

Neural Architecture Search

FairFed: Enabling Group Fairness in Federated Learning

no code implementations2 Oct 2021 Yahya H. Ezzeldin, Shen Yan, Chaoyang He, Emilio Ferrara, Salman Avestimehr

Training ML models which are fair across different demographic groups is of critical importance due to the increased integration of ML in crucial decision-making scenarios such as healthcare and recruitment.

Decision Making Fairness +1

Quantifying the Temperature of Heated Microdevices UsingScanning Thermal Probes

no code implementations8 Feb 2021 Amin Reihani, Shen Yan, Yuxuan Luan, Rohith Mittapally, Edgar Meyhofer, Pramod Reddy

Quantifying the temperature of microdevices is critical for probing nanoscale energy transport. Such quantification is often accomplished by integrating resistance thermometers into microdevices.

Mesoscale and Nanoscale Physics

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

no code implementations17 Sep 2020 Shen Yan, Yang Pen, Shiming Lai, Yu Liu, Maojun Zhang

Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall.

Binary Classification Image Retrieval +1

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

1 code implementation NeurIPS 2020 Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang

Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias.

Neural Architecture Search

Improve Unsupervised Domain Adaptation with Mixup Training

1 code implementation3 Jan 2020 Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, Liu Ren

Unsupervised domain adaptation studies the problem of utilizing a relevant source domain with abundant labels to build predictive modeling for an unannotated target domain.

Domain Generalization Human Activity Recognition +2

Unsupervised Visual Representation Learning with Increasing Object Shape Bias

no code implementations17 Nov 2019 Zhibo Wang, Shen Yan, XiaoYu Zhang, Niels Lobo

(Very early draft)Traditional supervised learning keeps pushing convolution neural network(CNN) achieving state-of-art performance.

Object Representation Learning

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

2 code implementations ECCV 2020 Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis

We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.

Instance Segmentation object-detection +3

HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking

no code implementations31 Aug 2019 Shen Yan, Biyi Fang, Faen Zhang, Yu Zheng, Xiao Zeng, Hui Xu, Mi Zhang

Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover.

Neural Architecture Search

Learning Low-shot facial representations via 2D warping

no code implementations13 Dec 2017 Shen Yan

In this work, we mainly study the influence of the 2D warping module for one-shot face recognition.

Face Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.