Search Results for author: Xiaohan Wang

Found 55 papers, 36 papers with code

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

1 code implementation • 19 Mar 2024 • Elaine Sui, Xiaohan Wang, Serena Yeung-Levy

Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting.

Prompt Engineering Zero-shot Generalization +1

Paper
Code

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

no code implementations • 15 Mar 2024 • Xiaohan Wang, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy

Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences.

Ranked #1 on Zero-Shot Video Question Answer on NExT-QA

Language Modelling Large Language Model +2

Paper
Add Code

Editing Conceptual Knowledge for Large Language Models

1 code implementation • 10 Mar 2024 • Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs).

knowledge editing

1,362

Paper
Code

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation • 19 Jan 2024 • Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

Paper
Code

Describing Differences in Image Sets with Natural Language

1 code implementation • 5 Dec 2023 • Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

Language Modelling

Paper
Code

Exploring Large Language Models for Human Mobility Prediction under Public Events

no code implementations • 29 Nov 2023 • Yuebing Liang, Yichao Liu, Xiaohan Wang, Zhan Zhao

Accurate human mobility prediction for public events is thus crucial for event planning as well as traffic or crowd management.

Misinformation

Paper
Add Code

IcoCap: Improving Video Captioning by Compounding Images

no code implementations • IEEE Transactions on Multimedia 2023 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Ranked #5 on Video Captioning on VATEX (using extra training data)

Image Captioning Video Captioning

Paper
Add Code

Editing Personality for Large Language Models

1 code implementation • 3 Oct 2023 • Shengyu Mao, Xiaohan Wang, Mengru Wang, Yong Jiang, Pengjun Xie, Fei Huang, Ningyu Zhang

This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits.

1,362

Paper
Code

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation • 4 Sep 2023 • Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #2 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

Paper
Code

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

2 code implementations • 14 Aug 2023 • Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen

Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data.

knowledge editing

1,362

Paper
Code

Bird's-Eye-View Scene Graph for Vision-Language Navigation

1 code implementation • ICCV 2023 • Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

Navigate Vision-Language Navigation

Paper
Code

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

no code implementations • 9 Aug 2023 • Liping Wang, Jiawei Li, Lifan Zhao, Zhizhuo Kou, Xiaohan Wang, Xinyi Zhu, Hao Wang, Yanyan Shen, Lei Chen

Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market.

Stock Price Prediction

Paper
Add Code

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Paper
Code

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation • ICCV 2023 • Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Paper
Code

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

1 code implementation • 25 Jul 2023 • Haitian Zeng, Xiaohan Wang, Wenguan Wang, Yi Yang

We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation.

Vision and Language Navigation

Paper
Code

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

1 code implementation • 15 Jun 2023 • Jiayi Shao, Xiaohan Wang, Ruijie Quan, Yi Yang

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.

Ranked #1 on Moment Queries on Ego4D

Moment Queries Natural Language Queries

Paper
Code

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

no code implementations • 3 Jun 2023 • Xu Zhang, Zhedong Zheng, Xiaohan Wang, Yi Yang

We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.

Ranked #1 on Image Retrieval with Multi-Modal Query on Fashion200k

Image Retrieval Image Retrieval with Multi-Modal Query +1

Paper
Add Code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation • 29 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Paper
Code

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation • 28 May 2023 • Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

Paper
Code

Action Sensitivity Learning for Temporal Action Localization

no code implementations • ICCV 2023 • Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang

Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding.

Ranked #9 on Temporal Action Localization on THUMOS’14

Moment Queries Temporal Action Localization +1

Paper
Add Code

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation • 23 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Ruijie Quan, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

Ranked #1 on Scene Text Recognition on WOST (using extra training data)

Language Modelling Scene Text Recognition

Paper
Code

Gloss-Free End-to-End Sign Language Translation

1 code implementation • 22 May 2023 • Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

Paper
Code

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

1 code implementation • 22 May 2023 • Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, Ningyu Zhang

We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference.

Event Extraction graph construction +4

252

Paper
Code

Continual Multimodal Knowledge Graph Construction

1 code implementation • 15 May 2023 • Xiang Chen, Ningyu Zhang, Jintian Zhang, Xiaohan Wang, Tongtong Wu, Xi Chen, Yongheng Wang, Huajun Chen

Multimodal Knowledge Graph Construction (MKGC) involves creating structured representations of entities and relations using multiple modalities, such as text and images.

Continual Learning graph construction +1

Paper
Code

How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?

2 code implementations • 2 May 2023 • Xin Xu, Yuqi Zhu, Xiaohan Wang, Ningyu Zhang

Scaling language models have revolutionized widespread NLP tasks, yet little comprehensively explored few-shot relation extraction with large language models.

In-Context Learning Language Modelling +3

2,899

Paper
Code

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

Ranked #47 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Code

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation • CVPR 2023 • Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

Paper
Code

MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

1 code implementation • ICCV 2023 • Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang

Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

Object

Paper
Code

Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation

no code implementations • CVPR 2023 • Guangrui Li, Guoliang Kang, Xiaohan Wang, Yunchao Wei, Yi Yang

With the help of adversarial training, the masking module can learn to generate source masks to mimic the pattern of irregular target noise, thereby narrowing the domain gap.

Point Cloud Segmentation Semantic Segmentation

Paper
Add Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

200

Paper
Code

EASpace: Enhanced Action Space for Policy Transfer

1 code implementation • 7 Dec 2022 • Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

In this paper, a novel algorithm named EASpace (Enhanced Action Space) is proposed, which formulates macro actions in an alternative form to accelerate the learning process using multiple available sub-optimal expert policies.

Q-Learning Transfer Learning

Paper
Code

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification

1 code implementation • IEEE Transactions on Neural Networks and Learning Systems 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods.

Ranked #27 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Fine-Grained Visual Recognition

Paper
Code

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

1 code implementation • 17 Nov 2022 • Jiayi Shao, Xiaohan Wang, Yi Yang

Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.

Moment Queries Temporal Action Localization

Paper
Code

LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings

2 code implementations • 1 Oct 2022 • Xin Xie, Zhoubo Li, Xiaohan Wang, Zekun Xi, Ningyu Zhang

Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information.

Graph Representation Learning Knowledge Graph Embeddings +3

631

Paper
Code

Slimmable Networks for Contrastive Self-supervised Learning

no code implementations • 30 Sep 2022 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

In this work, we present a one-stage solution to obtain pre-trained small models without the need for extra teachers, namely, slimmable networks for contrastive self-supervised learning (\emph{SlimCLR}).

Contrastive Learning Knowledge Distillation +1

Paper
Add Code

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

no code implementations • 8 Jul 2022 • Yucheng Suo, Zhedong Zheng, Xiaohan Wang, Bang Zhang, Yi Yang

We optimize the two losses and keypoint detector network in an end-to-end manner.

Image Animation Text Generation +1

Paper
Add Code

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

1 code implementation • 1 Jul 2022 • Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.

Ranked #3 on Natural Language Queries on Ego4D

Data Augmentation Natural Language Queries

Paper
Code

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

1 code implementation • 2 May 2022 • Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.

Ranked #11 on Video Retrieval on MSVD (using extra training data)

Clustering Retrieval +1

119

Paper
Code

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations • 22 Mar 2022 • Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Ranked #3 on Semi-Supervised Video Object Segmentation on YouTube-VOS 2019

Object Segmentation +3

560

Paper
Code

Multi-robot Cooperative Pursuit via Potential Field-Enhanced Reinforcement Learning

no code implementations • 9 Mar 2022 • Zheng Zhang, Xiaohan Wang, Qingrui Zhang, Tianjiang Hu

It is shown by numerical simulations that the proposed hybrid design outperforms the pursuit policies either learned from vanilla reinforcement learning or designed by the potential field method.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Action Keypoint Network for Efficient Video Recognition

no code implementations • 17 Jan 2022 • Xu Chen, Yahong Han, Xiaohan Wang, Yifan Sun, Yi Yang

An effective approach is to select informative content from the holistic video, yielding a popular family of dynamic video recognition methods.

Ranked #42 on Action Recognition on Something-Something V1

Action Recognition Point Cloud Classification +1

Paper
Add Code

Reasoning Through Memorization: Nearest Neighbor Knowledge Graph Embeddings

1 code implementation • 14 Jan 2022 • Peng Wang, Xin Xie, Xiaohan Wang, Ningyu Zhang

Previous knowledge graph embedding approaches usually map entities to representations and utilize score functions to predict the target entities, yet they typically struggle to reason rare or emerging unseen entities.

Ranked #1 on Link Prediction on FB15k-237-ind

Knowledge Graph Embedding Knowledge Graph Embeddings +2

Paper
Code

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

1 code implementation • CVPR 2022 • Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

In contrast, our large-scale VIdeo Panoptic Segmentation in the Wild (VIPSeg) dataset provides 3, 536 videos and 84, 750 frames with pixel-level panoptic annotations, covering a wide range of real-world scenarios and categories.

Segmentation Video Panoptic Segmentation

119

Paper
Code

A Simple Episodic Linear Probe Improves Visual Recognition in the Wild

2 code implementations • CVPR 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, we propose an episodic linear probing (ELP) classifier to reflect the generalization of visual representations in an online manner.

Ranked #13 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Long-tail Learning +1

581

Paper
Code

Self-supervised Point Cloud Representation Learning via Separating Mixed Shapes

1 code implementation • 1 Sep 2021 • Chao Sun, Zhedong Zheng, Xiaohan Wang, Mingliang Xu, Yi Yang

Albeit simple, the pre-trained encoder can capture the key points of an unseen point cloud and surpasses the encoder trained from scratch on downstream tasks.

Ranked #43 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +3

Paper
Code

PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion

no code implementations • ICCV 2021 • Haitian Zeng, Yuchao Dai, Xin Yu, Xiaohan Wang, Yi Yang

As NRSfM is a highly under-constrained problem, we propose two new pairwise regularization to further regularize the reconstruction.

Paper
Add Code

Less is More: Sparse Sampling for Dense Reaction Predictions

no code implementations • 3 Jun 2021 • Kezhou Lin, Xiaohan Wang, Zhedong Zheng, Linchao Zhu, Yi Yang

Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience.

Paper
Add Code

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

1 code implementation • 31 May 2021 • Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang

In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.

Language Modelling Management +2

Paper
Code

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

1 code implementation • CVPR 2021 • Xiaohan Wang, Linchao Zhu, Yi Yang

Moreover, a global alignment method is proposed to provide a global cross-modal measurement that is complementary to the local perspective.

Retrieval Video Retrieval

1,409

Paper
Code

Learning to Anticipate Egocentric Actions by Imagination

no code implementations • 13 Jan 2021 • Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu

We further improve ImagineRNN by residual anticipation, i. e., changing its target to predicting the feature difference of adjacent frames instead of the frame content.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-55 (Unseen test set (S2)

Action Anticipation Autonomous Driving +1

Paper
Add Code

Interactive Prototype Learning for Egocentric Action Recognition

no code implementations • ICCV 2021 • Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang

To avoid these additional costs, we propose an end-to-end Interactive Prototype Learning (IPL) framework to learn better active object representations by leveraging the motion cues from the actor.

Action Recognition Object +1

Paper
Add Code

Variable-Viewpoint Representations for 3D Object Recognition

no code implementations • 8 Feb 2020 • Tengyu Ma, Joel Michelson, James Ainooson, Deepayan Sanyal, Xiaohan Wang, Maithilee Kunda

For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions.

3D Object Recognition Object

Paper
Add Code

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

no code implementations • 8 Feb 2020 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification.

Ranked #4 on Egocentric Activity Recognition on EGTEA

Action Recognition Egocentric Activity Recognition +5

Paper
Add Code

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019

no code implementations • 22 Jun 2019 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

In this report, we present the Baidu-UTS submission to the EPIC-Kitchens Action Recognition Challenge in CVPR 2019.

Action Recognition Object +2

Paper
Add Code

The Toybox Dataset of Egocentric Visual Object Transformations

no code implementations • 15 Jun 2018 • Xiaohan Wang, Tengyu Ma, James Ainooson, Seunghwan Cha, Xiaotian Wang, Azhar Molla, Maithilee Kunda

In object recognition research, many commonly used datasets (e. g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e. g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles.

Object Object Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.