no code implementations • 19 Jun 2018 • Bi Li, Wenxuan Xie, Wen-Jun Zeng, Wenyu Liu
Generally, model update is formulated as an online learning problem where a target model is learned over the online training set.
Ranked #1 on Visual Tracking on OTB-100
no code implementations • 13 Nov 2018 • Hao Luo, Wenxuan Xie, Xinggang Wang, Wen-Jun Zeng
Trackers are in general more efficient than detectors but bear the risk of drifting.
no code implementations • 17 Jan 2023 • Jianheng Tang, Kejia Fan, Wenxuan Xie, Luomin Zeng, Feijiang Han, Guosheng Huang, Tian Wang, Anfeng Liu, Shaobo Zhang
In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS.
no code implementations • CVPR 2023 • Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu
Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.
no code implementations • 2 Jun 2023 • Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu
In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).
no code implementations • 7 Oct 2023 • Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu
In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.
no code implementations • 8 Dec 2023 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu
To address these issues, we introduce a simple yet effective retrieval-based video language model (R-VLM) for efficient and interpretable long video QA.
no code implementations • 20 Feb 2024 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu
A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs.
1 code implementation • CVPR 2021 • Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong
The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.
2 code implementations • 12 Sep 2021 • Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng
Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module.
Ranked #394 on Image Classification on ImageNet