Search Results for author: En Yu

Found 22 papers, 4 papers with code

Causal-Informed Contrastive Learning: Towards Bias-Resilient Pre-training under Concept Drift

no code implementations11 Feb 2025 Xiaoyu Yang, Jie Lu, En Yu

The evolution of large-scale contrastive pre-training propelled by top-tier datasets has reached a transition point in the scaling law.

Causal Inference Contrastive Learning

PerPO: Perceptual Preference Optimization via Discriminative Rewarding

no code implementations5 Feb 2025 Zining Zhu, Liang Zhao, Kangheng Lin, Jinze Yang, En Yu, Chenglong Liu, Haoran Wei, Jianjian Sun, Zheng Ge, Xiangyu Zhang

This paper presents Perceptual Preference Optimization (PerPO), a perception alignment method aimed at addressing the visual discrimination challenges in generative pre-trained multimodal large language models (MLLMs).

Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection

no code implementations3 Feb 2025 Tianlin Zhang, En Yu, Yi Shao, Shuai Li, Sujuan Hou, Jiande Sun

Multimodal fake news detection has garnered significant attention due to its profound implications for social security.

Fake News Detection

Cross-View Referring Multi-Object Tracking

1 code implementation23 Dec 2024 Sijia Chen, En Yu, Wenbing Tao

It introduces the cross-view to obtain the appearances of objects from multiple views, avoiding the problem of the invisible appearances of objects in RMOT task.

Cross-view Referring Multi-Object Tracking Object

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

1 code implementation12 Dec 2024 Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang

This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.

Logical Reasoning Long-Context Understanding

Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards

1 code implementation22 May 2024 Xiaoyu Yang, Jie Lu, En Yu

This mainly includes gradual drift due to long-tailed data and sudden drift from Out-Of-Distribution (OOD) data, both of which have increasingly drawn the attention of the research community.

Language Modeling Language Modelling +1

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

1 code implementation CVPR 2024 Sijia Chen, En Yu, Jinyang Li, Wenbing Tao

In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets.

Data Augmentation Multiple Object Tracking +1

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations23 Jan 2024 Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Language Modeling Language Modelling +4

Online Boosting Adaptive Learning under Concept Drift for Multistream Classification

no code implementations17 Dec 2023 En Yu, Jie Lu, Bin Zhang, Guangquan Zhang

Specifically, OBAL operates in a dual-phase mechanism, in the first of which we design an Adaptive COvariate Shift Adaptation (AdaCOSA) algorithm to construct an initialized ensemble model using archived data from various source streams, thus mitigating the covariate shift while learning the dynamic correlations via an adaptive re-weighting strategy.

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations18 Jul 2023 Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modeling +3

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations18 Jul 2023 Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking

no code implementations23 May 2023 En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao

Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics.

Denoising Multi-Object Tracking +1

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

no code implementations3 Dec 2022 En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao

VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.

Domain Generalization Multi-Object Tracking +1

Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking

no code implementations23 Aug 2022 Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao

Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.

3D Multi-Object Tracking 3D Object Detection +2

RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation

no code implementations10 May 2021 En Yu, Zhuoling Li, Shoudong Han, Hongwei Wang

Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID).

Multiple Object Tracking Object +1

MAT: Motion-Aware Multi-Object Tracking

no code implementations10 Sep 2020 Shoudong Han, Piao Huang, Hongwei Wang, En Yu, Donghaisheng Liu, Xiaofeng Pan, Jun Zhao

Modern multi-object tracking (MOT) systems usually model the trajectories by associating per-frame detections.

Multi-Object Tracking Object

Refinements in Motion and Appearance for Online Multi-Object Tracking

no code implementations16 Mar 2020 Piao Huang, Shoudong Han, Jun Zhao, Donghaisheng Liu, Hongwei Wang, En Yu, Alex ChiChung Kot

Modern multi-object tracking (MOT) system usually involves separated modules, such as motion model for location and appearance model for data association.

Blocking Multi-Object Tracking +1

Fusion-supervised Deep Cross-modal Hashing

no code implementations25 Apr 2019 Li Wang, Lei Zhu, En Yu, Jiande Sun, Huaxiang Zhang

Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages.

Cross-Modal Retrieval Deep Hashing

Cannot find the paper you are looking for? You can Submit a new open access paper.