Search Results for author: Kun Yan

Found 13 papers, 4 papers with code

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

no code implementations22 Dec 2023 Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

1 code implementation27 Jun 2023 Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data.

Natural Language Queries

Two-shot Video Object Segmentation

1 code implementation CVPR 2023 Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu

The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.

Object Pseudo Label +5

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

no code implementations10 Oct 2022 Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds.

Vocal Bursts Intensity Prediction

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

1 code implementation22 Sep 2022 Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query.

Contrastive Learning Video Grounding

Control Image Captioning Spatially and Temporally

no code implementations ACL 2021 Kun Yan, Lei Ji, Huaishao Luo, Ming Zhou, Nan Duan, Shuai Ma

Moreover, the controllability and explainability of LoopCAG are validated by analyzing spatial and temporal sensitivity during the generation process.

Contrastive Learning Image Captioning +1

CETransformer: Casual Effect Estimation via Transformer Based Representation Learning

no code implementations19 Jul 2021 Zhenyu Guo, Shuai Zheng, Zhizhe Liu, Kun Yan, Zhenfeng Zhu

Treatment effect estimation, which refers to the estimation of causal effects and aims to measure the strength of the causal relationship, is of great importance in many fields but is a challenging problem in practice.

counterfactual Representation Learning +1

Few-shot Image Classification with Multi-Facet Prototypes

no code implementations1 Feb 2021 Kun Yan, Zied Bouraoui, Ping Wang, Shoaib Jameel, Steven Schockaert

The aim of few-shot learning (FSL) is to learn how to recognize image categories from a small number of training examples.

Classification Few-Shot Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.