Search Results for author: Wenxuan Xie

Found 10 papers, 2 papers with code

Unsupervised Visual Representation Learning by Tracking Patches in Video

1 code implementation CVPR 2021 Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong

The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.

Action Classification Action Recognition +1

Learning to Update for Object Tracking with Recurrent Meta-learner

no code implementations19 Jun 2018 Bi Li, Wenxuan Xie, Wen-Jun Zeng, Wenyu Liu

Generally, model update is formulated as an online learning problem where a target model is learned over the online training set.

Meta-Learning Visual Object Tracking +1

A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd

no code implementations17 Jan 2023 Jianheng Tang, Kejia Fan, Wenxuan Xie, Luomin Zeng, Feijiang Han, Guosheng Huang, Tian Wang, Anfeng Liu, Shaobo Zhang

In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS.

Unifying Layout Generation with a Decoupled Diffusion Model

no code implementations CVPR 2023 Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu

Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.

Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

no code implementations2 Jun 2023 Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu

In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).

Prompt Engineering

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

no code implementations7 Oct 2023 Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu

In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.

document understanding Reinforcement Learning (RL)

Retrieval-based Video Language Model for Efficient Long Video Question Answering

no code implementations8 Dec 2023 Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

To address these issues, we introduce a simple yet effective retrieval-based video language model (R-VLM) for efficient and interpretable long video QA.

Language Modelling Natural Language Understanding +4

Slot-VLM: SlowFast Slots for Video-Language Modeling

no code implementations20 Feb 2024 Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs.

Language Modelling Object +3

Cannot find the paper you are looking for? You can Submit a new open access paper.