1 code implementation • 23 May 2024 • Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang song, Kun Gai, Yadong Mu
Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.
no code implementations • 12 May 2024 • Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu
In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations.
1 code implementation • 12 Mar 2024 • Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng
In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty.
1 code implementation • 21 Feb 2024 • Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liu
Gaze object prediction aims to predict the location and category of the object that is watched by a human.
1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu
In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.
Ranked #64 on Visual Question Answering on MM-Vet
1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu
Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.
no code implementations • CVPR 2023 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.
no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu
Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.
1 code implementation • 27 Sep 2022 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.
1 code implementation • JBHI 2022 • Wentao Liu,Huihua Yang, Tong Tian, Zhiwei Cao, Xipeng Pan, Weijin Xu, Yang Jin, Feng Gao
The results demonstrate that FR-UNet outperforms state-of-the-art methods by achieving the highest Sen, AUC, F1, and IOU on most of the above-mentioned datasets with fewer parameters, and that DTI enhances vessel connectivity while greatly improving sensitivity.
Ranked #1 on Retinal Vessel Segmentation on DRIVE
no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu
The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.
no code implementations • 10 Dec 2017 • Edgar Xi, Selina Bing, Yang Jin
The capsule network has shown its potential by achieving a state-of-the-art result of 0. 25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0. 39%.
no code implementations • COLING 2016 • Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, Fei Huang
A news article summary usually consists of 2-3 key sentences that reflect the gist of that news article.