Search Results for author: Yang Jin

Found 14 papers, 7 papers with code

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

1 code implementation • 23 May 2024 • Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang song, Kun Gai, Yadong Mu

Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.

Image Generation

Paper
Code

DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

no code implementations • 12 May 2024 • Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu

In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations.

Language Modelling Robot Manipulation

Paper
Add Code

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

1 code implementation • 12 Mar 2024 • Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty.

Computational Efficiency

Paper
Code

TransGOP: Transformer-Based Gaze Object Prediction

1 code implementation • 21 Feb 2024 • Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liu

Gaze object prediction aims to predict the location and category of the object that is watched by a human.

Gaze Estimation Object +2

Paper
Code

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Ranked #64 on Visual Question Answering on MM-Vet

Video Understanding Visual Question Answering

401

Paper
Code

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.

Language Modelling Large Language Model +1

401

Paper
Code

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

no code implementations • CVPR 2023 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.

Decoder

Paper
Add Code

Video Action Segmentation via Contextually Refined Temporal Keypoints

no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.

Action Segmentation Graph Matching +1

Paper
Add Code

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

1 code implementation • 27 Sep 2022 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.

Decoder Spatio-Temporal Video Grounding +1

Paper
Code

Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation

1 code implementation • JBHI 2022 • Wentao Liu，Huihua Yang, Tong Tian, Zhiwei Cao, Xipeng Pan, Weijin Xu, Yang Jin, Feng Gao

The results demonstrate that FR-UNet outperforms state-of-the-art methods by achieving the highest Sen, AUC, F1, and IOU on most of the above-mentioned datasets with fewer parameters, and that DTI enhances vessel connectivity while greatly improving sensitivity.

Ranked #1 on Retinal Vessel Segmentation on DRIVE

Retinal Vessel Segmentation Segmentation

Paper
Code

Complex Video Action Reasoning via Learnable Markov Logic Network

no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu

The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.

Action Recognition Human-Object Interaction Detection +1

Paper
Add Code

Capsule Network Performance on Complex Data

no code implementations • 10 Dec 2017 • Edgar Xi, Selina Bing, Yang Jin

The capsule network has shown its potential by achieving a state-of-the-art result of 0. 25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0. 39%.

Data Augmentation

Paper
Add Code

Using Relevant Public Posts to Enhance News Article Summarization

no code implementations • COLING 2016 • Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, Fei Huang

A news article summary usually consists of 2-3 key sentences that reflect the gist of that news article.

Sentence Sentence Compression

Paper
Add Code

A Preliminary Study of Disputation Behavior in Online Debating Forum

no code implementations • WS 2016 • Zhongyu Wei, Y Xia, i, Chen Li, Yang Liu, Zachary Stallbohm, Yi Li, Yang Jin

Argument Mining

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.