Search Results for author: Longyin Wen

Found 43 papers, 18 papers with code

Accurate and Fast Compressed Video Captioning

1 code implementation ICCV 2023 Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang

Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.

Video Captioning

Exploring the Role of Audio in Video Captioning

no code implementations21 Jun 2023 YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang

Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Text with Knowledge Graph Augmented Transformer for Video Captioning

no code implementations CVPR 2023 Xin Gu, Guang Chen, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen

Meanwhile, the internal stream is designed to exploit the multi-modality information in videos (e. g., the appearance of video frames, speech transcripts, and video captions) to ensure the quality of caption results.

Video Captioning

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation6 Mar 2023 Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Image Captioning Text Generation

Dual-Stream Transformer for Generic Event Boundary Captioning

1 code implementation7 Jul 2022 Xin Gu, Hanhua Ye, Guang Chen, YuFei Wang, Libo Zhang, Longyin Wen

This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition.

Boundary Captioning

SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection

1 code implementation25 Jun 2022 Dexiang Hong, Xiaoqi Ma, Xinyao Wang, CongCong Li, YuFei Wang, Longyin Wen

This report presents the algorithm used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR 2022.

Boundary Detection Optical Flow Estimation

Structured Context Transformer for Generic Event Boundary Detection

no code implementations7 Jun 2022 CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen

To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.

Boundary Detection

Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark

1 code implementation ICCV 2021 Boying Wang, Libo Zhang, Longyin Wen, Xianglong Liu, Yanjun Wu

Towards real-world prohibited item detection, we collect a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items.

Generic Event Boundary Detection Challenge at CVPR 2021 Technical Report: Cascaded Temporal Attention Network (CASTANET)

1 code implementation1 Jul 2021 Dexiang Hong, CongCong Li, Longyin Wen, Xinyao Wang, Libo Zhang

In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module.

Boundary Detection Event Segmentation

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

1 code implementation CVPR 2021 Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu

To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33, 600 HD frames in various scenarios.

object-detection Object Detection

Efficient Pig Counting in Crowds with Keypoints Tracking and Spatial-aware Temporal Response Filtering

no code implementations27 May 2020 Guang Chen, Shiwen Shen, Longyin Wen, Si Luo, Liefeng Bo

Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlapping.


Rethinking Object Detection in Retail Stores

1 code implementation18 Mar 2020 Yuan-Qiang Cai, Longyin Wen, Libo Zhang, Dawei Du, Weiqiang Wang

In this paper, we propose a new task, ie, simultaneously object localization and counting, abbreviated as Locount, which requires algorithms to localize groups of objects of interest with the number of instances.

object-detection Object Detection +1

Multi-Drone based Single Object Tracking with Agent Sharing Network

1 code implementation16 Mar 2020 Pengfei Zhu, Jiayu Zheng, Dawei Du, Longyin Wen, Yiming Sun, QinGhua Hu

Moreover, an agent sharing network (ASNet) is proposed by self-supervised template sharing and view-aware fusion of the target from multiple drones, which can improve the tracking accuracy significantly compared with single drone tracking.

Object Tracking

Detection and Tracking Meet Drones Challenge

2 code implementations16 Jan 2020 Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling

We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.

Multi-Object Tracking object-detection +1

Learning Semantic Neural Tree for Human Parsing

no code implementations ECCV 2020 Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, Siwei Lyu

In this paper, we design a novel semantic neural tree for human parsing, which uses a tree architecture to encode physiological structure of human body, and designs a coarse to fine process in a cascade manner to generate accurate results.

Human Parsing Semantic Segmentation

SiamMan: Siamese Motion-aware Network for Visual Tracking

no code implementations11 Dec 2019 Wenzhang Zhou, Longyin Wen, Libo Zhang, Dawei Du, Tiejian Luo, Yanjun Wu

To reduce the impact of manually designed anchor boxes to adapt to different target motion patterns, we design the localization branch, which aims to coarsely localize the target to help the regression branch to generate accurate results.

General Classification regression +1

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

1 code implementation4 Dec 2019 Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu

This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude.

Crowd Counting

Guided Attention Network for Object Detection and Counting on Drones

no code implementations25 Sep 2019 Yuan-Qiang Cai, Dawei Du, Libo Zhang, Longyin Wen, Weiqiang Wang, Yanjun Wu, Siwei Lyu

Object detection and counting are related but challenging problems, especially for drone based scenes with small objects and cluttered background.

object-detection Object Detection

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

2 code implementations CVPR 2020 Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, Feiyue Huang

Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree.

Fine-Grained Image Classification Fine-Grained Visual Categorization

ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

no code implementations29 Jul 2019 Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li

The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world.

Gesture Recognition

Spatiotemporal CNN for Video Object Segmentation

1 code implementation CVPR 2019 Kai Xu, Longyin Wen, Guorong Li, Liefeng Bo, Qingming Huang

Specifically, the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data, is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation.

Segmentation Semantic Segmentation +4

Learning Non-Uniform Hypergraph for Multi-Object Tracking

no code implementations10 Dec 2018 Longyin Wen, Dawei Du, Shengkun Li, Xiao Bian, Siwei Lyu

The majority of Multi-Object Tracking (MOT) algorithms based on the tracking-by-detection scheme do not use higher order dependencies among objects or tracklets, which makes them less effective in handling complex scenarios.

Multi-Object Tracking

Evolvement Constrained Adversarial Learning for Video Style Transfer

no code implementations6 Nov 2018 Wenbo Li, Longyin Wen, Xiao Bian, Siwei Lyu

Video style transfer is a useful component for applications such as augmented reality, non-photorealistic rendering, and interactive games.

Optical Flow Estimation Style Transfer +1

ScratchDet: Training Single-Shot Object Detectors from Scratch

1 code implementation CVPR 2019 Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, Tao Mei

Taking this advantage, we are able to explore various types of networks for object detection, without suffering from the poor convergence.

General Classification object-detection +1

Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd

no code implementations ECCV 2018 Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li

Pedestrian detection in crowded scenes is a challenging problem since the pedestrians often gather together and occlude each other.

Ranked #10 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection

Vision Meets Drones: A Challenge

no code implementations20 Apr 2018 Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, QinGhua Hu

In this paper we present a large-scale visual object detection and tracking benchmark, named VisDrone2018, aiming at advancing visual understanding tasks on the drone platform.

Multi-Object Tracking object-detection +1

Single-Shot Refinement Neural Network for Object Detection

10 code implementations CVPR 2018 Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li

For object detection, the two-stage approach (e. g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e. g., SSD) has the advantage of high efficiency.

object-detection Object Detection

Contrast Enhancement Estimation for Digital Image Forensics

no code implementations13 Jun 2017 Longyin Wen, Honggang Qi, Siwei Lyu

Our method recovers the original pixel histogram and the contrast enhancement simultaneously from a single image with an iterative algorithm.

Image Forensics

Stochastic Online AUC Maximization

no code implementations NeurIPS 2016 Yiming Ying, Longyin Wen, Siwei Lyu

From this saddle representation, a stochastic online algorithm (SOLAM) is proposed which has time and space complexity of one datum.

Geometric Hypergraph Learning for Visual Tracking

no code implementations18 Mar 2016 Dawei Du, Honggang Qi, Longyin Wen, Qi Tian, Qingming Huang, Siwei Lyu

Graph based representation is widely used in visual tracking field by finding correct correspondences between target parts in consecutive frames.

Visual Tracking

Category-Blind Human Action Recognition: A Practical Recognition System

no code implementations ICCV 2015 Wenbo Li, Longyin Wen, Mooi Choo Chuah, Siwei Lyu

In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category.

Action Recognition Temporal Action Localization

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

no code implementations13 Nov 2015 Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, Siwei Lyu

In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.

Multi-Object Tracking object-detection +1

JOTS: Joint Online Tracking and Segmentation

no code implementations CVPR 2015 Longyin Wen, Dawei Du, Zhen Lei, Stan Z. Li, Ming-Hsuan Yang

We present a novel Joint Online Tracking and Segmentation (JOTS) algorithm which integrates the multi-part tracking and segmentation into a unified energy optimization framework to handle the video segmentation task.

Segmentation Video Segmentation +1

The Fastest Deformable Part Model for Object Detection

no code implementations CVPR 2014 Junjie Yan, Zhen Lei, Longyin Wen, Stan Z. Li

Three prohibitive steps in cascade version of DPM are accelerated, including 2D correlation between root filter and feature map, cascade part pruning and HOG feature extraction.

Face Detection object-detection +1

A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions

no code implementations CVPR 2014 Menglong Yang, Yiguang Liu, Longyin Wen, Zhisheng You, Stan Z. Li

Mutual occlusions among targets can cause track loss or target position deviation, because the observation likelihood of an occluded target may vanish even when we have the estimated location of the target.

Cannot find the paper you are looking for? You can Submit a new open access paper.