Search Results for author: Jianbo Yuan

Found 22 papers, 9 papers with code

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

no code implementations10 Jan 2024 Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.

Multimodal Reasoning

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

1 code implementation10 Jan 2024 Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Qianli Ma, Guoyin Wang, Xuwu Wang, Jing Su, Jingjing Xu, Ming Zhu, Yao Cheng, Jianbo Yuan, Jiwei Li, Kun Kuang, Yang Yang, Hongxia Yang, Fei Wu

In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks.

Benchmarking

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

no code implementations3 Dec 2023 Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou

In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest.

In-Context Learning

Self-Infilling Code Generation

1 code implementation29 Nov 2023 Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, Lingpeng Kong

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.

Code Generation

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

no code implementations28 Nov 2023 Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts.

Image Generation

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations20 Nov 2023 Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

LoBaSS: Gauging Learnability in Supervised Fine-tuning Data

no code implementations16 Oct 2023 Haotian Zhou, Tingkai Liu, Qianli Ma, Jianbo Yuan, PengFei Liu, Yang You, Hongxia Yang

In this paper, we introduce a new dimension in SFT data selection: learnability.

LEMON: Lossless model expansion

no code implementations12 Oct 2023 Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

Let Models Speak Ciphers: Multiagent Debate through Embeddings

no code implementations10 Oct 2023 Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang

Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary.

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

1 code implementation CVPR 2023 Yuxiao Chen, Jianbo Yuan, Yu Tian, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang

However, direct aligning cross-modal information using such representations is challenging, as visual patches and text tokens differ in semantic levels and granularities.

Contrastive Learning

HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention

1 code implementation6 Mar 2023 Shijie Geng, Jianbo Yuan, Yu Tian, Yuxiao Chen, Yongfeng Zhang

The success of large-scale contrastive vision-language pretraining (CLIP) has benefited both visual recognition and multimodal content understanding.

A Reparameterized Discrete Diffusion Model for Text Generation

1 code implementation11 Feb 2023 Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong

This work studies discrete diffusion probabilistic models with applications to natural language generation.

Text Generation

Efficient Attention via Control Variates

1 code implementation9 Feb 2023 Lin Zheng, Jianbo Yuan, Chong Wang, Lingpeng Kong

Built upon previous progress of RFA, we characterize this gap through the lens of control variates and show that RFA can be decomposed into a sum of multiple control variate estimators for each element in the sequence.

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

1 code implementation20 Jul 2022 Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.

Action Detection Action Recognition +3

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

no code implementations20 May 2021 Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.

Contrastive Learning Image Captioning +4

Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment

no code implementations22 Jul 2019 Jianbo Yuan, Haofu Liao, Rui Luo, Jiebo Luo

In addition, in order to enrich the decoder with descriptive semantics and enforce the correctness of the deterministic medical-related contents such as mentions of organs or diagnoses, we extract medical concepts based on the radiology reports in the training data and fine-tune the encoder to extract the most frequent medical concepts from the x-ray images.

Descriptive Image Captioning +2

Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction

1 code implementation5 Jun 2019 Haofu Liao, Wei-An Lin, Jianbo Yuan, S. Kevin Zhou, Jiebo Luo

Extensive experiments show that our method significantly outperforms the existing unsupervised models for image-to-image translation problems, and achieves comparable performance to existing supervised models on a synthesized dataset.

Computed Tomography (CT) Disentanglement +3

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

no code implementations20 Jul 2018 Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo

Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on.

Twitter Sentiment Analysis

Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

no code implementations16 Nov 2016 Jianbo Yuan, Walid Shalaby, Mohammed Korayem, David Lin, Khalifeh Aljadda, Jiebo Luo

One of the most important features of the proposed technique is the fact that it can be applied on top of any existing CF based recommendation engine without changing the CF core.

Collaborative Filtering

Cannot find the paper you are looking for? You can Submit a new open access paper.