no code implementations • 27 Mar 2024 • Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.
no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.
Ranked #37 on Visual Question Answering on MM-Vet
no code implementations • 17 Jan 2024 • Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang
We argue that datasets with diverse and high-quality detailed instruction following annotations are essential and adequate for MLLMs IFT.
Ranked #43 on Visual Question Answering on MM-Vet
no code implementations • 10 Jan 2024 • Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang
In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.
no code implementations • 3 Dec 2023 • Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou
In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest.
no code implementations • 28 Nov 2023 • Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang
Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts.
no code implementations • 20 Nov 2023 • Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang
To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.
no code implementations • 10 Oct 2023 • Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou
Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling.
no code implementations • 7 Jun 2023 • Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu
We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.
Ranked #3 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)
no code implementations • 14 Jun 2022 • Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu
We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.
no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen
Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.
no code implementations • 25 Jan 2022 • Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu
As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.
no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha
In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
Domain Adaptive Person Re-Identification Knowledge Distillation +4
no code implementations • 30 Nov 2021 • Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu
This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.
Ranked #2 on Object Tracking on MMPTRACK
no code implementations • ACL 2021 • Xingyi Yang, Muchao Ye, Quanzeng You, Fenglong Ma
Medical report generation is one of the most challenging tasks in medical image analysis.
no code implementations • 1 Apr 2021 • Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu
TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.
Ranked #2 on Multi-Object Tracking on 2DMOT15 (using extra training data)
no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen
Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.
1 code implementation • COLING 2022 • Junyu Luo, Zifei Zheng, Hanzhong Ye, Muchao Ye, Yaqing Wang, Quanzeng You, Cao Xiao, Fenglong Ma
To fairly evaluate the performance, we also propose three specific evaluation metrics.
no code implementations • 26 Mar 2020 • Quanzeng You, Hao Jiang
Our DMCT consists of 1) a fast and novel perspective-aware Deep GroudPoint Network, 2) a fusion procedure for ground-plane occupancy heatmap estimation, 3) a novel Deep Glimpse Network for person detection and 4) a fast and accurate online tracker.
Ranked #5 on Multi-Object Tracking on Wildtrack
no code implementations • CVPR 2019 • Quanzeng You, Hao Jiang
Recognizing every person's action in a crowded and cluttered environment is a challenging task in computer vision.
no code implementations • 5 Mar 2019 • Hao Jiang, Quanzeng You
Different from the traditional multiple view approaches, which find key points in 2D and then triangulate to recover the 3D locations, our method directly processes the dynamic 3D data that involve both clutter and crowd.
no code implementations • ECCV 2018 • Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo
It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.
no code implementations • 20 Jul 2018 • Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo
Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on.
no code implementations • 10 Jul 2018 • Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo
It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.
no code implementations • 6 Jun 2018 • Quanzeng You, Hao Jiang
In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings.
no code implementations • CVPR 2018 • Quanzeng You, Zhengyou Zhang, Jiebo Luo
Usually, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are employed for learning image and sentence representations, respectively.
no code implementations • 30 Jan 2018 • Quanzeng You, Hailin Jin, Jiebo Luo
In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions.
no code implementations • 19 Jun 2017 • Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao
Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results.
no code implementations • 24 May 2017 • Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo
Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, that is, photographs.
no code implementations • 28 Nov 2016 • Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo
Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.
2 code implementations • 9 May 2016 • Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang
We hope that this data set encourages further research on visual emotion analysis.
no code implementations • CVPR 2016 • Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, Jiebo Luo
Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing.
no code implementations • 20 Sep 2015 • Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang
Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis.