Search Results for author: Jiaming Li

Found 18 papers, 10 papers with code

ReactDiff: Latent Diffusion for Facial Reaction Generation

1 code implementation20 May 2025 Jiaming Li, Sheng Wang, Xin Wang, Yitao Zhu, Honglin Xiong, Zixu Zhuang, Qian Wang

Given the audio-visual clip of the speaker, facial reaction generation aims to predict the listener's facial reactions.

Decoder Diversity

Kimi-VL Technical Report

1 code implementation10 Apr 2025 Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +4

Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

no code implementations3 Mar 2025 Yitao Zhu, Yuan Yin, Jiaming Li, Mengjie Xu, Zihao Zhao, Honglin Xiong, Sheng Wang, Qian Wang

The adoption of visual foundation models has become a common practice in computer-aided diagnosis (CAD).

Diagnostic

MITracker: Multi-View Integration for Visual Object Tracking

no code implementations CVPR 2025 Mengjie Xu, Yitao Zhu, Haotian Jiang, Jiaming Li, Zhenrong Shen, Sheng Wang, Haolin Huang, Xinyu Wang, Qing Yang, Han Zhang, Qian Wang

Multi-view object tracking (MVOT) offers promising solutions to challenges such as occlusion and target loss, which are common in traditional single-view tracking.

Object Visual Object Tracking

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

1 code implementation8 Jan 2025 Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, Lei Zhang, Yangyi Chen, Hamid Alinejad-Rokny, Fei Huang

In the alignment phase, a pre-trained speech model is further trained on text-image tasks to generalize from vision to speech in a (near) zero-shot manner, outperforming models trained on tri-modal datasets.

Decoder Emotional Speech Synthesis +3

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding

1 code implementation3 Jan 2025 Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li

In this method, we design a Cross-Modal Value-Enhanced Decoding(CMVED) module to alleviate hallucination by a novel contrastive decoding mechanism.

Hallucination Language Modeling +2

Benchmarking Large Language Models for Image Classification of Marine Mammals

1 code implementation22 Oct 2024 Yijiashun Qi, Shuzhang Cai, Zunduo Zhao, Jiaming Li, Yanbin Lin, Zhiqiang Wang

Further progress has been made in multimodal LLMs, with many datasets created to evaluate LLMs with vision abilities.

Benchmarking image-classification +1

PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation

no code implementations2 Oct 2024 Jing Luo, Run Luo, Longze Chen, Liang Zhu, Chang Ao, Jiaming Li, Yukun Chen, Xin Cheng, Wen Yang, Jiayuan Su, Chengming Li, Min Yang

To bridge this gap, we propose a data augmentation approach and introduce PersonaMathQA, a dataset derived from MATH and GSM8K, on which we train the PersonaMath models.

Data Augmentation Diversity +3

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

1 code implementation27 Sep 2024 Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, Yuelin Bai, Run Luo, Longze Chen, Min Yang

Specifically, Ruler equips LLMs with the ability to generate responses of a specified length based on length constraints within the instructions.

Instruction Following

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

1 code implementation26 Jun 2024 Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

We applied the HCP strategy in experiments with six Repo-Code LLMs, and the results demonstrate that our proposed method can significantly enhance completion accuracy while substantially reducing the length of input.

Code Completion

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

no code implementations CVPR 2024 Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

Specifically, we devise three modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification, to empower the detector to discover, represent, and leverage implicit object knowledge explored from background proposals.

Knowledge Distillation Object +4

Instruction-Guided Visual Masking

1 code implementation30 May 2024 Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan

To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.

Instruction Following Visual Grounding +1

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

2 code implementations CVPR 2024 Yuhao Sun, Lingyun Yu, Hongtao Xie, Jiaming Li, Yongdong Zhang

In this paper, we propose a novel face protection approach, dubbed DiffAM, which leverages the powerful generative ability of diffusion models to generate high-quality protected face images with adversarial makeup transferred from reference images.

Adversarial Attack Face Recognition

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations CVPR 2024 Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

1 code implementation ICCV 2023 Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, YingYing Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module.

object-detection Object Detection +1

Single-pixel imaging based on deep learning

no code implementations25 Oct 2023 Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuangping Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao

Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors.

Deep Learning Super-Resolution

Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

no code implementations CVPR 2021 Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang

Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.

Cannot find the paper you are looking for? You can Submit a new open access paper.