Search Results for author: Jinfa Huang

Found 14 papers, 9 papers with code

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

no code implementations • 29 May 2024 • Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries.

Paper
Add Code

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

2 code implementations • 7 Apr 2024 • Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

1,145

Paper
Code

LLMBind: A Unified Modality-Task Integration Framework

no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Paper
Add Code

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan

In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.

Ranked #58 on Visual Question Answering on MM-Vet

Hallucination Visual Question Answering

2,539

Paper
Code

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach

1 code implementation • 28 Jan 2024 • Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan

At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings.

Image Outpainting

Paper
Code

GPT-4V(ision) as A Social Media Analysis Engine

1 code implementation • 13 Nov 2023 • Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo

Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content.

Hallucination Hate Speech Detection +1

Paper
Code

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

1 code implementation • 9 Nov 2023 • Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton

Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face.

655

Paper
Code

Improving Scene Graph Generation with Superpixel-Based Interaction Learning

no code implementations • 4 Aug 2023 • Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng

(ii) We explore intra-entity and cross-entity interactions among the superpixels to enrich fine-grained interactions between entities at an earlier stage.

Graph Generation Scene Graph Generation +1

Paper
Add Code

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

4 code implementations • 20 May 2023 • Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

103

Paper
Code

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

1 code implementation • 15 May 2023 • Jingyi Wang, Jinfa Huang, Can Zhang, Zhidong Deng

In this paper, we propose a Time-variant Relation-aware TRansformer (TR$^2$), which aims to model the temporal change of relations in dynamic scene graphs.

Relation Scene Graph Generation +1

Paper
Code

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

4 code implementations • CVPR 2023 • Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.

Ranked #8 on Video Question Answering on MSRVTT-QA

Contrastive Learning Question Answering +5

103

Paper
Code

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

4 code implementations • 21 Nov 2022 • Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.

Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)

Contrastive Learning Representation Learning +5

103

Paper
Code

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

Paper
Add Code

Guoym at SemEval-2020 Task 8: Ensemble-based Classification of Visuo-Lingual Metaphor in Memes

no code implementations • SEMEVAL 2020 • YingMei Guo, Jinfa Huang, Yanlong Dong, Mingxing Xu

In our system, we utilize five types of representation of data as input of base classifiers to extract information from different aspects.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.