Search Results for author: Cong Wei

Found 12 papers, 7 papers with code

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

no code implementations1 Dec 2024 Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen

Current large multimodal models (LMMs) face significant challenges in processing and comprehending long-duration or high-resolution videos, which is mainly due to the lack of high-quality datasets.

Instruction Following Video Understanding

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

1 code implementation26 Nov 2024 Cong Wei, Yujie Zhong, Haoxian Tan, Yong liu, Zheng Zhao, Jie Hu, Yujiu Yang

This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs).

 Ranked #1 on Referring Expression Segmentation on RefCOCO+ val (using extra training data)

Large Language Model Open Vocabulary Semantic Segmentation +8

MANTIS: Interleaved Multi-Image Instruction Tuning

1 code implementation2 May 2024 Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

We further evaluate Mantis on single-image benchmarks and demonstrate that Mantis also maintains a strong single-image performance on par with CogVLM and Emu2.

LaSagnA: Language-based Segmentation Assistant for Complex Queries

1 code implementation12 Apr 2024 Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.

Segmentation Semantic Segmentation

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

1 code implementation21 Mar 2024 Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods.

Image to Video Generation Style Transfer +1

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

1 code implementation6 Feb 2024 Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen

To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation.

Image to Video Generation

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

no code implementations22 Dec 2023 Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models.

Conditional Image Generation General Knowledge

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

no code implementations28 Nov 2023 Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image.

Benchmarking Information Retrieval +2

DreamEdit: Subject-driven Image Editing

no code implementations22 Jun 2023 Tianle Li, Max Ku, Cong Wei, Wenhu Chen

In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i. e., Subject Replacement and Subject Addition.

Image Generation Position

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

1 code implementation CVPR 2023 Cong Wei, Brendan Duke, Ruowei Jiang, Parham Aarabi, Graham W. Taylor, Florian Shkurti

Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity.

Cannot find the paper you are looking for? You can Submit a new open access paper.