Search Results for author: Quanzeng You

Found 34 papers, 3 papers with code

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

no code implementations28 May 2024 Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang

In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs).

Language Modelling Large Language Model +1

ViTAR: Vision Transformer with Any Resolution

no code implementations27 Mar 2024 Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.

Self-Supervised Learning Semantic Segmentation

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

no code implementations17 Jan 2024 Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

We argue that datasets with diverse and high-quality detailed instruction following annotations are essential and adequate for MLLMs IFT.

Image Captioning Instruction Following +1

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

no code implementations10 Jan 2024 Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.

Multimodal Reasoning

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

no code implementations3 Dec 2023 Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou

In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest.

In-Context Learning

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

no code implementations28 Nov 2023 Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts.

Image Generation

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations20 Nov 2023 Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

no code implementations7 Jun 2023 Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu

We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.

Ranked #5 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Contrastive Learning Denoising +4

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

no code implementations14 Jun 2022 Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu

We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.

Instance Segmentation Object +3

Deep Frequency Filtering for Domain Generalization

no code implementations CVPR 2023 Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization Retrieval

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

no code implementations25 Jan 2022 Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Question Answering Visual Question Answering

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

no code implementations1 Apr 2021 Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.

Ranked #2 on Multi-Object Tracking on 2DMOT15 (using extra training data)

Decoder Multi-Object Tracking +3

Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

no code implementations25 Mar 2021 Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.

Disentanglement Diversity +3

Real-time 3D Deep Multi-Camera Tracking

no code implementations26 Mar 2020 Quanzeng You, Hao Jiang

Our DMCT consists of 1) a fast and novel perspective-aware Deep GroudPoint Network, 2) a fusion procedure for ground-plane occupancy heatmap estimation, 3) a novel Deep Glimpse Network for person detection and 4) a fast and accurate online tracker.

Human Detection Multi-Object Tracking

Action4D: Online Action Recognition in the Crowd and Clutter

no code implementations CVPR 2019 Quanzeng You, Hao Jiang

Recognizing every person's action in a crowded and cluttered environment is a challenging task in computer vision.

Action Recognition Temporal Action Localization

Real-time Multiple People Hand Localization in 4D Point Clouds

no code implementations5 Mar 2019 Hao Jiang, Quanzeng You

Different from the traditional multiple view approaches, which find key points in 2D and then triangulate to recover the 3D locations, our method directly processes the dynamic 3D data that involve both clutter and crowd.

``Factual'' or ``Emotional'': Stylized Image Captioning with Adaptive Learning and Attention

no code implementations ECCV 2018 Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo

It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.

Image Captioning

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

no code implementations20 Jul 2018 Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo

Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on.

Twitter Sentiment Analysis

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

no code implementations10 Jul 2018 Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo

It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.

Image Captioning

Action4D: Real-time Action Recognition in the Crowd and Clutter

no code implementations6 Jun 2018 Quanzeng You, Hao Jiang

In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings.

Action Recognition Temporal Action Localization

End-to-End Convolutional Semantic Embeddings

no code implementations CVPR 2018 Quanzeng You, Zhengyou Zhang, Jiebo Luo

Usually, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are employed for learning image and sentence representations, respectively.

Sentence

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

no code implementations19 Jun 2017 Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao

Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results.

Cultural Diffusion and Trends in Facebook Photographs

no code implementations24 May 2017 Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo

Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, that is, photographs.

Image Based Appraisal of Real Estate Properties

no code implementations28 Nov 2016 Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo

Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.

Image Captioning with Semantic Attention

no code implementations CVPR 2016 Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, Jiebo Luo

Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing.

Image Captioning

Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

no code implementations20 Sep 2015 Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang

Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis.

Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.