Search Results for author: Shen Yan

Found 31 papers, 13 papers with code

Multiview Transformers for Video Recognition

1 code implementation • CVPR 2022 • Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.

Ranked #5 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

2,979

Paper
Code

UnLoc: A Unified Framework for Video Localization Tasks

1 code implementation • ICCV 2023 • Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

Ranked #1 on Action Segmentation on COIN

Action Segmentation Moment Retrieval +5

2,979

Paper
Code

Streaming Dense Video Captioning

1 code implementation • 1 Apr 2024 • Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid

An ideal model for dense video captioning -- predicting captions localized temporally in a video -- should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to produce outputs before processing the entire video.

Dense Video Captioning

2,979

Paper
Code

Improve Unsupervised Domain Adaptation with Mixup Training

1 code implementation • 3 Jan 2020 • Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, Liu Ren

Unsupervised domain adaptation studies the problem of utilizing a relevant source domain with abundant labels to build predictive modeling for an unannotated target domain.

Ranked #49 on Domain Generalization on PACS

Domain Generalization Human Activity Recognition +2

1,327

Paper
Code

Efficient Large Language Models: A Survey

3 code implementations • 6 Dec 2023 • Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society.

Natural Language Understanding Text Generation

829

Paper
Code

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

2 code implementations • ECCV 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis

We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.

Instance Segmentation object-detection +3

158

Paper
Code

Deep AutoAugment

1 code implementation • 11 Mar 2022 • Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang

In this work, instead of fixing a set of hand-picked default augmentations alongside the searched data augmentations, we propose a fully automated approach for data augmentation search named Deep AutoAugment (DeepAA).

Ranked #2 on Data Augmentation on ImageNet

AutoML Data Augmentation +1

Paper
Code

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

1 code implementation • NeurIPS 2020 • Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang

Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias.

Ranked #9 on Neural Architecture Search on NAS-Bench-201, CIFAR-100

Neural Architecture Search

Paper
Code

Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation

1 code implementation • CVPR 2022 • Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo

In this paper, we propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance that is competitive to ANNs yet with low latency.

Paper
Code

Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks

1 code implementation • ICCV 2023 • Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo

In particular, our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.

Paper
Code

CATE: Computation-aware Neural Architecture Encoding with Transformers

1 code implementation • 14 Feb 2021 • Shen Yan, Kaiqiang Song, Fei Liu, Mi Zhang

Our experiments show that CATE is beneficial to the downstream search, especially in the large search space.

Ranked #15 on Neural Architecture Search on CIFAR-10

Neural Architecture Search Representation Learning +1

Paper
Code

NAS-Bench-x11 and the Power of Learning Curves

1 code implementation • NeurIPS 2021 • Shen Yan, Colin White, Yash Savani, Frank Hutter

While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research.

Neural Architecture Search

Paper
Code

Soft Augmentation for Image Classification

1 code implementation • CVPR 2023 • Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e. g., more aggressive image crop augmentations produce less confident learning targets.

Classification Data Augmentation +1

Paper
Code

Learning Low-shot facial representations via 2D warping

no code implementations • 13 Dec 2017 • Shen Yan

In this work, we mainly study the influence of the 2D warping module for one-shot face recognition.

Face Recognition

Paper
Add Code

Word-based Domain Adaptation for Neural Machine Translation

no code implementations • IWSLT (EMNLP) 2018 • Shen Yan, Leonard Dahlmann, Pavel Petrushkov, Sanjika Hewavitharana, Shahram Khadivi

Pre-training a model with word weights improves fine-tuning up to 1. 24% BLEU absolute and 1. 64% TER, respectively.

Domain Adaptation Language Modelling +3

Paper
Add Code

HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking

no code implementations • 31 Aug 2019 • Shen Yan, Biyi Fang, Faen Zhang, Yu Zheng, Xiao Zeng, Hui Xu, Mi Zhang

Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover.

Neural Architecture Search

Paper
Add Code

Unsupervised Visual Representation Learning with Increasing Object Shape Bias

no code implementations • 17 Nov 2019 • Zhibo Wang, Shen Yan, XiaoYu Zhang, Niels Lobo

(Very early draft)Traditional supervised learning keeps pushing convolution neural network(CNN) achieving state-of-art performance.

Object Representation Learning

Paper
Add Code

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

no code implementations • 17 Sep 2020 • Shen Yan, Yang Pen, Shiming Lai, Yu Liu, Maojun Zhang

Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall.

Binary Classification Image Retrieval +1

Paper
Add Code

Deep Learning in the Era of Edge Computing: Challenges and Opportunities

no code implementations • 17 Oct 2020 • Mi Zhang, Faen Zhang, Nicholas D. Lane, Yuanchao Shu, Xiao Zeng, Biyi Fang, Shen Yan, Hui Xu

The era of edge computing has arrived.

Edge-computing

Paper
Add Code

The Twelvefold Way of Non-Sequential Lossless Compression

no code implementations • 8 Nov 2020 • Taha Ameen ur Rahman, Alton S. Barbehenn, Xinan Chen, Hassan Dbouk, James A. Douglas, Yuncong Geng, Ian George, John B. Harvill, Sung Woo Jeon, Kartik K. Kansal, Kiwook Lee, Kelly A. Levick, Bochao Li, Ziyue Li, Yashaswini Murthy, Adarsh Muthuveeru-Subramaniam, S. Yagiz Olmez, Matthew J. Tomei, Tanya Veeravalli, Xuechao Wang, Eric A. Wayman, Fan Wu, Peng Xu, Shen Yan, Heling Zhang, Yibo Zhang, Yifan Zhang, Yibo Zhao, Sourya Basu, Lav R. Varshney

Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions.

Information Theory Information Theory

Paper
Add Code

Quantifying the Temperature of Heated Microdevices UsingScanning Thermal Probes

no code implementations • 8 Feb 2021 • Amin Reihani, Shen Yan, Yuxuan Luan, Rohith Mittapally, Edgar Meyhofer, Pramod Reddy

Quantifying the temperature of microdevices is critical for probing nanoscale energy transport. Such quantification is often accomplished by integrating resistance thermometers into microdevices.

Mesoscale and Nanoscale Physics

Paper
Add Code

FairFed: Enabling Group Fairness in Federated Learning

no code implementations • 2 Oct 2021 • Yahya H. Ezzeldin, Shen Yan, Chaoyang He, Emilio Ferrara, Salman Avestimehr

Training ML models which are fair across different demographic groups is of critical importance due to the increased integration of ML in crucial decision-making scenarios such as healthcare and recruitment.

Decision Making Fairness +1

Paper
Add Code

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

no code implementations • 9 Dec 2022 • Shen Yan, Tao Zhu, ZiRui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu

We explore an efficient approach to establish a foundational video-text model.

Ranked #1 on Video Captioning on ActivityNet Captions (using extra training data)

Question Answering Retrieval +9

Paper
Add Code

Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior

no code implementations • 13 Feb 2023 • Shen Yan, Xiaoya Cheng, Yuxiang Liu, Juelin Zhu, Rouwan Wu, Yu Liu, Maojun Zhang

Despite the significant progress in 6-DoF visual localization, researchers are mostly driven by ground-level benchmarks.

Pose Estimation Visual Localization

Paper
Add Code

Long-term Visual Localization with Mobile Sensors

no code implementations • CVPR 2023 • Shen Yan, Yu Liu, Long Wang, Zehong Shen, Zhen Peng, Haomin Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou

Despite the remarkable advances in image matching and pose estimation, image-based localization of a camera in a temporally-varying outdoor environment is still a challenging problem due to huge appearance disparity between query and reference images caused by illumination, seasonal and structural changes.

Image-Based Localization Pose Estimation +1

Paper
Add Code

AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

no code implementations • 18 Apr 2023 • Yang Liu, Shen Yan, Yuge Zhang, Kan Ren, Quanlu Zhang, Zebin Ren, Deng Cai, Mi Zhang

Vision Transformers have shown great performance in single tasks such as classification and segmentation.

Multi-Task Learning Neural Architecture Search

Paper
Add Code

Deep Active Contours for Real-time 6-DoF Object Tracking

no code implementations • ICCV 2023 • Long Wang, Shen Yan, Jianan Zhen, Yu Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou

Specifically, given an initial pose, we project the object model to the image plane to obtain the initial contour and use a lightweight network to predict how the contour should move to match the true object boundary, which provides the gradients to optimize the object pose.

Computational Efficiency Object +1

Paper
Add Code

Pixel Aligned Language Models

no code implementations • 14 Dec 2023 • Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.

Language Modelling

Paper
Add Code

UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization

no code implementations • 11 Jan 2024 • Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Maojun Zhang, Shen Yan

Despite significant progress in global localization of Unmanned Aerial Vehicles (UAVs) in GPS-denied environments, existing methods remain constrained by the availability of datasets.

Synthetic Data Generation Visual Localization

Paper
Add Code

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

no code implementations • 16 Feb 2024 • Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang

Our research undertakes a thorough exploration of the state-of-the-art perceiver resampler architecture and builds a strong baseline.

Language Modelling Question Answering +1

Paper
Add Code

VideoPrism: A Foundational Visual Encoder for Video Understanding

no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

Question Answering Video Question Answering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.