Search Results for author: Zhaoyang Li

Found 16 papers, 5 papers with code

Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

no code implementations1 Mar 2025 Yanjun Li, Zhaoyang Li, Honghui Chen, Lizhi Xu

Video Scene Graph Generation (VidSGG) aims to capture dynamic relationships among entities by sequentially analyzing video frames and integrating visual and semantic information.

Graph Generation Scene Graph Generation +2

When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?

1 code implementation18 Dec 2024 Tongzhou Mu, Zhaoyang Li, Stanisław Wiktor Strzelecki, Xiu Yuan, Yunchao Yao, Litian Liang, Hao Su

Visual reinforcement learning is a promising approach that directly trains policies from visual observations, although it faces challenges in sample efficiency and computational costs.

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

no code implementations22 Nov 2024 Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu

To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.

Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation

no code implementations25 Aug 2024 Zhaoyang Li, YuAn Wang, Wangkai Li, Rui Sun, Tianzhu Zhang

Point cloud few-shot semantic segmentation (PC-FSS) aims to segment targets of novel categories in a given query point cloud with only a few annotated support samples.

Diversity Few-Shot Semantic Segmentation +1

Recognize Anything: A Strong Image Tagging Model

2 code implementations6 Jun 2023 Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

model Semantic Parsing

Extracting knowledge from features with multilevel abstraction

no code implementations4 Dec 2021 Jinhong Lin, Zhaoyang Li

Knowledge distillation aims at transferring the knowledge from a large teacher model to a small student model with great improvements of the performance of the student model.

Data Augmentation Self-Knowledge Distillation

FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains

no code implementations CVPR 2021 Jia Li, Zhaoyang Li, Jie Cao, Xingguang Song, Ran He

In this work, we propose a novel two-stage framework named FaceInpainter to implement controllable Identity-Guided Face Inpainting (IGFI) under heterogeneous domains.

Attribute Facial Inpainting +1

Information Bottleneck Disentanglement for Identity Swapping

1 code implementation CVPR 2021 Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He

In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.

Disentanglement Face Recognition +1

A Neural Network for Detailed Human Depth Estimation from a Single Image

1 code implementation ICCV 2019 Sicong Tang, Feitong Tan, Kelvin Cheng, Zhaoyang Li, Siyu Zhu, Ping Tan

To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively.

Depth Estimation

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

no code implementations8 Nov 2016 Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition.

Action Recognition Temporal Action Localization

Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier

no code implementations1 Feb 2016 Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.