1 code implementation • ECCV 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.
Ranked #1 on
Trajectory Forecasting
on ActEV
no code implementations • 15 Dec 2024 • Xinli Xu, Wenhang Ge, Dicong Qiu, Zhifei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, HanFeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen
We demonstrate that 3D Gaussians with physical property annotations enable applications in physics-based dynamic simulation and robotic grasping.
no code implementations • 5 Dec 2024 • Rong Li, Shijie Li, Lingdong Kong, Xulei Yang, Junwei Liang
3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on textual descriptions, which is essential for applications like augmented reality and robotics.
no code implementations • 26 Nov 2024 • Sheng Wang, Yao Tian, Xiaodong Mei, Ge Sun, Jie Cheng, Fulong Ma, Pedro V. Sander, Junwei Liang
However, these algorithms typically assess the current and historical plans independently, leading to discontinuities in driving intentions and an accumulation of errors with each step in a discontinuous plan.
no code implementations • 19 Nov 2024 • Teli Ma, Zifan Wang, Jiaming Zhou, Mengmeng Wang, Junwei Liang
To address these limitations, we propose GLOVER, a unified Generalizable Open-Vocabulary Affordance Reasoning framework, which fine-tunes the Large Language Models (LLMs) to predict visual affordance of graspable object parts within RGB feature space.
Common Sense Reasoning
Human-Object Interaction Detection
+2
1 code implementation • 20 Sep 2024 • Zeying Gong, Tianshuai Hu, Ronghe Qiu, Junwei Liang
We conduct a detailed experimental analysis with the state-of-the-art learning-based method and two classic rule-based path-planning algorithms on the new benchmark.
no code implementations • 18 Jul 2024 • Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding.
no code implementations • 26 Jun 2024 • Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang
Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments.
no code implementations • 20 Jun 2024 • Jiaming Zhou, Teli Ma, Kun-Yu Lin, Zifan Wang, Ronghe Qiu, Junwei Liang
Our method employs a human-robot contrastive alignment loss to align the semantics of human and robot videos, adapting pre-trained models to the robot domain in a parameter-efficient manner.
1 code implementation • 14 Jun 2024 • Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang
Real-time detection and prediction of extreme weather protect human lives and infrastructure.
no code implementations • 14 Jun 2024 • Teli Ma, Jiaming Zhou, Zifan Wang, Ronghe Qiu, Junwei Liang
Developing robots capable of executing various manipulation tasks, guided by natural language instructions and visual observations of intricate real-world environments, remains a significant challenge in robotics.
1 code implementation • 23 May 2024 • Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong
In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT.
Contrastive Learning
Gloss-free Sign Language Translation
+2
no code implementations • 16 May 2024 • Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang
Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts.
no code implementations • 19 Apr 2024 • Sheng Wang, Ge Sun, Fulong Ma, Tianshuai Hu, Qiang Qin, Yongkang Song, Lei Zhu, Junwei Liang
Inspired by DragGAN in image generation, we propose DragTraffic, a generalized, interactive, and controllable traffic scene generation framework based on conditional diffusion.
1 code implementation • 25 Mar 2024 • Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics.
no code implementations • 24 Mar 2024 • Xiaoyu Zhu, Junwei Liang, Po-Yao Huang, Alex Hauptmann
The second is a Masked Consistency Learning module to learn class-discriminative representations.
1 code implementation • 18 Mar 2024 • Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang
Furthermore, for the popular HM3D environment, we present an Instance Navigation (InstanceNav) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (ObjectNav) task where the goal is defined merely by the object category.
no code implementations • 22 Jan 2024 • Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zheng
With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.
no code implementations • 29 Nov 2023 • Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang
Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.
no code implementations • 28 Nov 2023 • Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang
To this end, this work aims to build a weakly supervised end-to-end framework for training recognition models on long videos, with only video-level action category labels.
Ranked #1 on
Action Segmentation
on Breakfast
1 code implementation • 4 Oct 2023 • Yujin Tang, Jiaming Zhou, Xiang Pan, Zeying Gong, Junwei Liang
Accurate precipitation forecasting is a vital challenge of societal importance.
2 code implementations • 1 Oct 2023 • Zeying Gong, Yujin Tang, Junwei Liang
Although the Transformer has been the dominant architecture for time series forecasting tasks in recent years, a fundamental challenge remains: the permutation-invariant self-attention mechanism within Transformers leads to a loss of temporal information.
Ranked #1 on
Time Series Forecasting
on ETTh2 (192) Multivariate
no code implementations • 14 Sep 2023 • Rong Li, Shijie Li, Xieyuanli Chen, Teli Ma, Juergen Gall, Junwei Liang
In this paper, we present TFNet, a range-image-based LiDAR semantic segmentation method that utilizes temporal information to address this issue.
Ranked #1 on
Semantic Segmentation
on SemanticPOSS
1 code implementation • 21 Aug 2023 • Teli Ma, Rong Li, Junwei Liang
A challenging new task is subsequently added to evaluate the robustness of GVLMs against inherent inclination toward syntactical correctness.
Ranked #83 on
Visual Reasoning
on Winoground
no code implementations • 19 Aug 2023 • Jinhui Ye, Junwei Liang
This paper studies introducing viewpoint invariant feature representations in existing action recognition architecture.
1 code implementation • CVPR 2023 • Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann
The model uses a hierarchical transformer with intra-frame off-set attention and inter-frame self-attention.
7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
no code implementations • 27 Sep 2022 • Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen
To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.
1 code implementation • 26 Sep 2022 • Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
1 code implementation • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022 • Junwei Liang, He Zhu, Enwei Zhang, Jun Zhang
Distracted driver actions can be dangerous and cause severe accidents.
no code implementations • ICCV 2021 • Xiaoyu Zhu, Jeffrey Chen, Xiangrui Zeng, Junwei Liang, Chengqi Li, Sinuo Liu, Sima Behpour, Min Xu
We propose a novel weakly supervised approach for 3D semantic segmentation on volumetric images.
no code implementations • 4 Dec 2020 • Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann
The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.
4 code implementations • 20 Nov 2020 • Junwei Liang
With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving, socially-aware robot assistant and public safety monitoring.
1 code implementation • 30 Jun 2020 • Xiaoyu Zhu, Junwei Liang, Alexander Hauptmann
This provides the first benchmark for quantitative evaluation of models to assess building damage using aerial videos.
1 code implementation • 4 Apr 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We refer to our method as SimAug.
Ranked #2 on
Trajectory Prediction
on ActEV
1 code implementation • Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops 2020 • Wenhe Liu, Guoliang Kang, Po-Yao Huang, Xiaojun Chang, Yijun Qian, Junwei Liang, Liangke Gui, Jing Wen, Peng Chen
We propose an Efficient Activity Detection System, Argus, for Extended Video Analysis in the surveillance scenario.
1 code implementation • CVPR 2020 • Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann
The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.
Ranked #1 on
Multi-future Trajectory Prediction
on ForkingPaths
2 code implementations • 26 May 2019 • Junwei Liang, Jay D. Aronson, Alexander Hauptmann
Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.
2 code implementations • CVPR 2019 • Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei
To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen.
Ranked #1 on
Activity Prediction
on ActEV
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander Hauptmann
In addition to a text answer, a few grounding photos are also given to justify the answer.
Ranked #1 on
Memex Question Answering
on MemexQA
2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.
Ranked #1 on
Memex Question Answering
on MemexQA
1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.
1 code implementation • 16 Jul 2016 • Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann
Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.