Search Results for author: Bin Wen

Found 7 papers, 4 papers with code

EVLM: An Efficient Vision-Language Model for Visual Understanding

no code implementations19 Jul 2024 Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, JiaHong Wu, Fan Yang, Size Li, Di Zhang

However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to significant computational overhead.

Image Captioning Language Modelling +1

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

1 code implementation15 Jun 2024 Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

To address this gap, we introduce CoMM, a high-quality Coherent interleaved image-text MultiModal dataset designed to enhance the coherence, consistency, and alignment of generated multimodal content.

In-Context Learning Visual Storytelling

Optimization Efficient Open-World Visual Region Recognition

1 code implementation2 Nov 2023 Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

Building on the success of powerful image-level vision-language (ViL) foundation models like CLIP, recent efforts have sought to harness their capabilities by either training a contrastive model from scratch with an extensive collection of region-label pairs or aligning the outputs of a detection model with image-level representations of region proposals.

object-detection Object Recognition +1

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

1 code implementation9 Oct 2022 Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan

As a result, our model can extract effectively both static appearance and dynamic motion spontaneously, leading to superior spatiotemporal representation learning capability.

Representation Learning Semantic Segmentation +2

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

2 code implementations5 Mar 2022 Qishuai Diao, Yi Jiang, Bin Wen, Jia Sun, Zehuan Yuan

Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category.

Attribute Fine-Grained Image Classification

Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction

no code implementations1 Feb 2020 Bin Wen, Jie Luo, Xianglong Liu, Lei Huang

Extracting graph representation of visual scenes in image is a challenging task in computer vision.

Graph Generation Relation +1

An extended description logic system with knowledge element based on ALC

no code implementations16 Apr 2019 Bin Wen, Jianhou Gan, Juan L. G. Guirao, Wei Gao

With the rise of knowledge management and knowledge economy, the knowledge elements that directly link and embody the knowledge system have become the research focus and hotspot in certain areas.

Attribute Management +1

Cannot find the paper you are looking for? You can Submit a new open access paper.