Search Results for author: Junwei Liang

Found 30 papers, 20 papers with code

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

1 code implementation ECCV 2020 Junwei Liang, Lu Jiang, Alexander Hauptmann

We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.

Adversarial Attack Adversarial Defense +2

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

1 code implementation25 Mar 2024 Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang

Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics.

Prioritized Semantic Learning for Zero-shot Instance Navigation

no code implementations18 Mar 2024 Xander Sun, Louis Lau, Hoyard Zhi, Ronghe Qiu, Junwei Liang

Furthermore, for the popular HM3D environment, we present an Instance Navigation (InstanceNav) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (ObjectNav) task where the goal is defined merely by the object category.

Language Modelling Object

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

no code implementations22 Jan 2024 Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zheng

With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.

Action Recognition Video Description +1

GeoDeformer: Geometric Deformable Transformer for Action Recognition

no code implementations29 Nov 2023 Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang

Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.

Action Recognition

PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting

1 code implementation4 Oct 2023 Yujin Tang, Jiaming Zhou, Xiang Pan, Zeying Gong, Junwei Liang

To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting.

NWP Post-processing Precipitation Forecasting

PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting

2 code implementations1 Oct 2023 Zeying Gong, Yujin Tang, Junwei Liang

Although the Transformer has been the dominant architecture for time series forecasting tasks in recent years, a fundamental challenge remains: the permutation-invariant self-attention mechanism within Transformers leads to a loss of temporal information.

Time Series Time Series Forecasting

TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation

no code implementations14 Sep 2023 Rong Li, Shijie Li, Xieyuanli Chen, Teli Ma, Juergen Gall, Junwei Liang

In this paper, we present TFNet, a range-image-based LiDAR semantic segmentation method that utilizes temporal information to address this issue.

Autonomous Driving LIDAR Semantic Segmentation +1

An Examination of the Compositionality of Large Generative Vision-Language Models

1 code implementation21 Aug 2023 Teli Ma, Rong Li, Junwei Liang

A challenging new task is subsequently added to evaluate the robustness of GVLMs against inherent inclination toward syntactical correctness.

Visual Reasoning

Spatial-Temporal Alignment Network for Action Recognition

no code implementations19 Aug 2023 Jinhui Ye, Junwei Liang

This paper studies introducing viewpoint invariant feature representations in existing action recognition architecture.

Action Recognition

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

no code implementations27 Sep 2022 Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen

To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.

Cross-Modal Retrieval Retrieval +2

Multi-dataset Training of Transformers for Robust Action Recognition

1 code implementation26 Sep 2022 Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.

Action Recognition Temporal Action Localization

Spatial-Temporal Alignment Network for Action Recognition and Detection

no code implementations4 Dec 2020 Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann

The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.

Action Detection Action Recognition

From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video

4 code implementations20 Nov 2020 Junwei Liang

With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving, socially-aware robot assistant and public safety monitoring.

Action Detection Autonomous Driving +1

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

1 code implementation CVPR 2020 Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.

Autonomous Driving Human motion prediction +5

Focal Visual-Text Attention for Visual Question Answering

2 code implementations CVPR 2018 Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Memex Question Answering Question Answering +1

MemexQA: Visual Memex Question Answering

1 code implementation4 Aug 2017 Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

1 code implementation16 Jul 2016 Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.