Search Results for author: Quanzeng You

Found 33 papers, 2 papers with code

ViTAR: Vision Transformer with Any Resolution

no code implementations • 27 Mar 2024 • Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.

Self-Supervised Learning Semantic Segmentation

Paper
Add Code

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.

Ranked #37 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

no code implementations • 17 Jan 2024 • Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

We argue that datasets with diverse and high-quality detailed instruction following annotations are essential and adequate for MLLMs IFT.

Ranked #43 on Visual Question Answering on MM-Vet

Image Captioning Instruction Following +1

Paper
Add Code

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

no code implementations • 10 Jan 2024 • Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.

Multimodal Reasoning

Paper
Add Code

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

no code implementations • 3 Dec 2023 • Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou

In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest.

In-Context Learning

Paper
Add Code

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

no code implementations • 28 Nov 2023 • Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts.

Image Generation

Paper
Add Code

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations • 20 Nov 2023 • Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

Paper
Add Code

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

no code implementations • 10 Oct 2023 • Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou

Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling.

Image Generation

Paper
Add Code

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

no code implementations • 7 Jun 2023 • Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu

We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.

Ranked #3 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Contrastive Learning Denoising +4

Paper
Add Code

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

no code implementations • 14 Jun 2022 • Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu

We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.

Instance Segmentation Object +3

Paper
Add Code

Deep Frequency Filtering for Domain Generalization

no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization Retrieval

Paper
Add Code

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

no code implementations • 25 Jan 2022 • Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Question Answering Visual Question Answering

Paper
Add Code

Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha

In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.

Domain Adaptive Person Re-Identification Knowledge Distillation +4

Paper
Add Code

MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark

no code implementations • 30 Nov 2021 • Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu

This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.

Ranked #2 on Object Tracking on MMPTRACK

Multi-Object Tracking Multiple People Tracking +1

Paper
Add Code

Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation

no code implementations • ACL 2021 • Xingyi Yang, Muchao Ye, Quanzeng You, Fenglong Ma

Medical report generation is one of the most challenging tasks in medical image analysis.

Medical Report Generation Retrieval +1

Paper
Add Code

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

no code implementations • 1 Apr 2021 • Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.

Ranked #2 on Multi-Object Tracking on 2DMOT15 (using extra training data)

Multi-Object Tracking Multiple Object Tracking +2

Paper
Add Code

Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.

Disentanglement Domain Adaptive Person Re-Identification +2

Paper
Add Code

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

1 code implementation • COLING 2022 • Junyu Luo, Zifei Zheng, Hanzhong Ye, Muchao Ye, Yaqing Wang, Quanzeng You, Cao Xiao, Fenglong Ma

To fairly evaluate the performance, we also propose three specific evaluation metrics.

Benchmarking Machine Translation +2

Paper
Code

Real-time 3D Deep Multi-Camera Tracking

no code implementations • 26 Mar 2020 • Quanzeng You, Hao Jiang

Our DMCT consists of 1) a fast and novel perspective-aware Deep GroudPoint Network, 2) a fusion procedure for ground-plane occupancy heatmap estimation, 3) a novel Deep Glimpse Network for person detection and 4) a fast and accurate online tracker.

Ranked #5 on Multi-Object Tracking on Wildtrack

Human Detection Multi-Object Tracking

Paper
Add Code

Action4D: Online Action Recognition in the Crowd and Clutter

no code implementations • CVPR 2019 • Quanzeng You, Hao Jiang

Recognizing every person's action in a crowded and cluttered environment is a challenging task in computer vision.

Action Recognition Temporal Action Localization

Paper
Add Code

Real-time Multiple People Hand Localization in 4D Point Clouds

no code implementations • 5 Mar 2019 • Hao Jiang, Quanzeng You

Different from the traditional multiple view approaches, which find key points in 2D and then triangulate to recover the 3D locations, our method directly processes the dynamic 3D data that involve both clutter and crowd.

Paper
Add Code

``Factual'' or ``Emotional'': Stylized Image Captioning with Adaptive Learning and Attention

no code implementations • ECCV 2018 • Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo

It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.

Image Captioning

Paper
Add Code

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

no code implementations • 20 Jul 2018 • Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo

Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on.

Twitter Sentiment Analysis

Paper
Add Code

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

no code implementations • 10 Jul 2018 • Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo

It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context.

Image Captioning

Paper
Add Code

Action4D: Real-time Action Recognition in the Crowd and Clutter

no code implementations • 6 Jun 2018 • Quanzeng You, Hao Jiang

In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings.

Action Recognition Temporal Action Localization

Paper
Add Code

End-to-End Convolutional Semantic Embeddings

no code implementations • CVPR 2018 • Quanzeng You, Zhengyou Zhang, Jiebo Luo

Usually, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are employed for learning image and sentence representations, respectively.

Sentence

Paper
Add Code

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

no code implementations • 30 Jan 2018 • Quanzeng You, Hailin Jin, Jiebo Luo

In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions.

Image Captioning Natural Language Understanding

Paper
Add Code

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

no code implementations • 19 Jun 2017 • Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao

Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results.

Paper
Add Code

Cultural Diffusion and Trends in Facebook Photographs

no code implementations • 24 May 2017 • Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo

Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, that is, photographs.

Paper
Add Code

Image Based Appraisal of Real Estate Properties

no code implementations • 28 Nov 2016 • Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo

Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction.

Paper
Add Code

Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark

2 code implementations • 9 May 2016 • Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang

We hope that this data set encourages further research on visual emotion analysis.

Benchmarking Emotion Recognition

Paper
Code

Image Captioning with Semantic Attention

no code implementations • CVPR 2016 • Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, Jiebo Luo

Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing.

Image Captioning

Paper
Add Code

Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

no code implementations • 20 Sep 2015 • Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang

Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis.

Sentiment Analysis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.