no code implementations • 15 Jun 2024 • Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen
However, a large portion of videos in real-world applications are edited videos, \textit{e. g.}, users usually cut and add effects/modifications to the raw video before publishing it on social media platforms.
1 code implementation • 9 May 2024 • Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen
Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.
Ranked #1 on visual instruction following on LLaVA-Bench
no code implementations • 24 Mar 2024 • Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, YuFei Wang, Tiejian Luo, Sijie Zhu
Each video in our dataset is rendered by various image/video materials with a single editing component, which supports atomic visual understanding of different editing components.
1 code implementation • CVPR 2023 • Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing.
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
no code implementations • 31 Mar 2022 • Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing.
1 code implementation • CVPR 2022 • Sijie Zhu, Mubarak Shah, Chen Chen
It does not rely on polar transform and infers faster than CNN-based methods.
Ranked #3 on Image-Based Localization on VIGOR Cross Area
1 code implementation • 16 May 2021 • Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen, Delu Pan, Jianyu Chen, Liang Xiao, Qian Du
With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings.
Ranked #2 on 2D Semantic Segmentation on xBD
1 code implementation • 14 May 2021 • Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
1 code implementation • 18 Mar 2021 • Weiping Yu, Sijie Zhu, Taojiannan Yang, Chen Chen
Unlike most recent works that focused on applying active learning for image classification, we propose an effective Consistency-based Active Learning method for object Detection (CALD), which fully explores the consistency between original and augmented data.
3 code implementations • ICCV 2021 • Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding
Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation.
Ranked #13 on 3D Human Pose Estimation on HumanEva-I
1 code implementation • 24 Dec 2020 • Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah
Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included.
1 code implementation • CVPR 2021 • Sijie Zhu, Taojiannan Yang, Chen Chen
In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge.
no code implementations • 24 Nov 2020 • Sijie Zhu, Taojiannan Yang, Matias Mendieta, Chen Chen
Even under the same computational constraints, the performance of our adaptive networks can be significantly boosted over the baseline counterparts by the mutual training along three dimensions.
no code implementations • 27 Oct 2020 • Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen
Fast and effective responses are required when a natural disaster (e. g., earthquake, hurricane, etc.)
Ranked #3 on 2D Semantic Segmentation on xBD
1 code implementation • 2 Aug 2020 • Yu Shen, Sijie Zhu, Chen Chen, Qian Du, Liang Xiao, Jianyu Chen, Delu Pan
Therefore, to incorporate the long-range contextual information, a deep fully convolutional network (FCN) with an efficient non-local module, named ENL-FCN, is proposed for HSI classification.
1 code implementation • NeurIPS 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen
The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process.
no code implementations • 23 May 2020 • Sijie Zhu, Taojiannan Yang, Chen Chen
Street-to-aerial image geo-localization, which matches a query street-view image to the GPS-tagged aerial images in a reference set, has attracted increasing attention recently.
1 code implementation • 12 Apr 2020 • Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, Shanyue Guan
Specifically, we propose a Density-Map guided object detection Network (DMNet), which is inspired from the observation that the object density map of an image presents how objects distribute in terms of the pixel intensity of the map.
no code implementations • 1 Apr 2020 • Sijie Zhu, Chen Chen, Waqas Sultani
Temporal localization (i. e. indicating the start and end frames of the anomaly event in a video) is referred to as frame-level detection.
2 code implementations • ECCV 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis
We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.
1 code implementation • 27 Sep 2019 • Sijie Zhu, Taojiannan Yang, Chen Chen
This work explores the visual explanation for deep metric learning and its applications.
no code implementations • 25 Sep 2019 • Taojiannan Yang, Sijie Zhu, Yan Shen, Mi Zhang, Andrew Willis, Chen Chen
We propose a framework to mutually learn from different input resolutions and network widths.