Search Results for author: Yang Jin

Found 12 papers, 5 papers with code

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

1 code implementation12 Mar 2024 Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty.

Computational Efficiency

TransGOP: Transformer-Based Gaze Object Prediction

no code implementations21 Feb 2024 Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liu

Gaze object prediction aims to predict the location and category of the object that is watched by a human.

Gaze Estimation Object +2

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation5 Feb 2024 Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Video Understanding Visual Question Answering

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

1 code implementation9 Sep 2023 Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.

Language Modelling Large Language Model +1

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

no code implementations CVPR 2023 Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.

Video Action Segmentation via Contextually Refined Temporal Keypoints

no code implementations ICCV 2023 Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.

Action Segmentation Graph Matching +1

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

1 code implementation27 Sep 2022 Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.

Spatio-Temporal Video Grounding Video Grounding

Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation

1 code implementation JBHI 2022 Wentao Liu,Huihua Yang, Tong Tian, Zhiwei Cao, Xipeng Pan, Weijin Xu, Yang Jin, Feng Gao

The results demonstrate that FR-UNet outperforms state-of-the-art methods by achieving the highest Sen, AUC, F1, and IOU on most of the above-mentioned datasets with fewer parameters, and that DTI enhances vessel connectivity while greatly improving sensitivity.

Retinal Vessel Segmentation Segmentation

Complex Video Action Reasoning via Learnable Markov Logic Network

no code implementations CVPR 2022 Yang Jin, Linchao Zhu, Yadong Mu

The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.

Action Recognition Human-Object Interaction Detection +1

Capsule Network Performance on Complex Data

no code implementations10 Dec 2017 Edgar Xi, Selina Bing, Yang Jin

The capsule network has shown its potential by achieving a state-of-the-art result of 0. 25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0. 39%.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.