Search Results for author: Yi Bin

Found 20 papers, 16 papers with code

Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation

no code implementations10 Dec 2024 Thong Thanh Nguyen, Xiaobao Wu, Yi Bin, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

To overcome this limitation, we introduce a contrastive representation learning framework that focuses on motion pattern for temporal scene graph generation.

Contrastive Learning Graph Generation +3

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

1 code implementation11 Oct 2024 Yue Yang, Shuibai Zhang, Wenqi Shao, Kaipeng Zhang, Yi Bin, Yu Wang, Ping Luo

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across multimodal tasks such as visual perception and reasoning, leading to good performance on various multimodal evaluation benchmarks.

MME Question Answering +1

PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization

1 code implementation7 Oct 2024 Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo

In this work, we propose PrefixQuant, a novel quantization method that achieves state-of-the-art performance across various precision levels (W4A4KV4 and W4A8KV4) and granularities (dynamic and static quantization) by effectively isolating token-wise outliers.

Common Sense Reasoning Quantization

MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

1 code implementation8 Aug 2024 Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models.

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

1 code implementation1 Aug 2024 Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen

The iterative cross-modal boosting also functions in inference to further enhance coherence prediction in each modality.

GalleryGPT: Analyzing Paintings with Large Multimodal Models

1 code implementation1 Aug 2024 Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

Specifically, we first propose a task of composing paragraph analysis for artworks, i. e., painting in this paper, only focusing on visual characteristics to formulate more comprehensive understanding of artworks.

Art Analysis

Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection

1 code implementation17 Jul 2024 Zhenni Yu, Xiaoqin Zhang, Li Zhao, Yi Bin, Guobao Xiao

It maximizes the utilization of depth features while synergizing with RGB features to achieve multimodal complementarity, thereby overcoming the segmentation limitations of SAM and improving its accuracy in COD.

Knowledge Distillation object-detection +2

MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

1 code implementation4 Jul 2024 Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

To address these problems, we propose MAMA, a new approach to learning video-language representations by utilizing a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity.

Language Modeling Language Modelling +4

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

1 code implementation25 Jun 2024 Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions.

Diversity Math +2

Ensemble Diversity Facilitates Adversarial Transferability

1 code implementation CVPR 2024 Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, Heng Tao Shen

With the advent of ensemble-based attacks the transferability of generated adversarial examples is elevated by a noticeable margin despite many methods only employing superficial integration yet ignoring the diversity between ensemble models.

Diversity reinforcement-learning +1

Non-Autoregressive Sentence Ordering

1 code implementation19 Oct 2023 Yi Bin, Wenhao Shi, Bin Ji, Jipeng Zhang, Yujuan Ding, Yang Yang

Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step.

Decoder Sentence +1

Solving Math Word Problems with Reexamination

1 code implementation14 Oct 2023 Yi Bin, Wenhao Shi, Yujuan Ding, Yang Yang, See-Kiong Ng

Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules.

Descriptive Math

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

1 code implementation8 Aug 2023 Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

Specifically, on two key tasks, \textit{i. e.}, image-to-text and text-to-image retrieval, HAT achieves 7. 6\% and 16. 7\% relative score improvement of Recall@1 on MSCOCO, and 4. 4\% and 11. 6\% on Flickr30k respectively.

Cross-Modal Retrieval Image Retrieval +2

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

1 code implementation8 Aug 2023 Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e. g., hard negatives make the model learn efficiently and effectively.

Image-text matching Representation Learning +2

Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

1 code implementation8 May 2023 Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

For evaluating the possible expression variants, we design a path-based metric to evaluate the partial accuracy of expressions of a unified tree.

Math valid

Graph-to-Tree Learning for Solving Math Word Problems

1 code implementation ACL 2020 Jipeng Zhang, Lei Wang, Roy Ka-Wei Lee, Yi Bin, Yan Wang, Jie Shao, Ee-Peng Lim

While the recent tree-based neural models have demonstrated promising results in generating solution expression for the math word problem (MWP), most of these models do not capture the relationships and order information among the quantities well.

Decoder Math +1

Cannot find the paper you are looking for? You can Submit a new open access paper.