Search Results for author: Liming Zhao

Found 16 papers, 5 papers with code

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

no code implementations20 Mar 2025 Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie

Specifically, the instruction condition is injected into the grouped visual tokens at the local level and the learnable tokens at the global level, and we conduct the attention mechanism to complete the conditional compression.

Multiple-choice Video Understanding

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

1 code implementation5 Mar 2025 Nianzu Yang, Pandeng Li, Liming Zhao, Yang Li, Chen-Wei Xie, Yehui Tang, Xudong Lu, Zhihang Liu, Yun Zheng, Yu Liu, Junchi Yan

Trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch, extensive experiments demonstrate that CDT achieves state-of-the-art performance in video reconstruction tasks with just a single-step sampling.

Decoder Video Compression +2

ContextHOI: Spatial Context Learning for Human-Object Interaction Detection

no code implementations12 Dec 2024 Mingda Jia, Liming Zhao, Ge Li, Yun Zheng

To enhance the capabilities of object detectors for HOI detection, we present a dual-branch framework named ContextHOI, which efficiently captures both object detection features and spatial contexts.

Human-Object Interaction Detection Object +2

Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

no code implementations11 Dec 2024 Mingda Jia, Liming Zhao, Ge Li, Yun Zheng

Human-object interaction (HOI) detectors with popular query-transformer architecture have achieved promising performance.

Human-Object Interaction Detection

Improved Video VAE for Latent Video Diffusion Model

no code implementations10 Nov 2024 Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha

Specifically, the KTC architecture divides the latent space into two branches, in which one half completely inherits the compression prior of keyframes from a lower-dimension image VAE while the other half involves temporal compression through 3D group causal convolution, reducing temporal-spatial conflicts and accelerating the convergence speed of video VAE.

model Video Reconstruction

Learning Restricted Boltzmann Machines with greedy quantum search

no code implementations25 Sep 2023 Liming Zhao, Aman Agrawal, Patrick Rebentrost

Restricted Boltzmann Machines (RBMs) are widely used probabilistic undirected graphical models with visible and latent nodes, playing an important role in statistics and machine learning.

Provable learning of quantum states with graphical models

no code implementations17 Sep 2023 Liming Zhao, Naixu Guo, Ming-Xing Luo, Patrick Rebentrost

Several works consider subclasses of quantum states that can be learned in polynomial sample complexity such as stabilizer states or high-temperature Gibbs states.

PAC learning

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation NeurIPS 2023 Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

1 code implementation ICCV 2023 Pandeng Li, Chen-Wei Xie, Liming Zhao, Hongtao Xie, Jiannan Ge, Yun Zheng, Deli Zhao, Yongdong Zhang

In the event-sentence prototype matching phase, we design a temporal prototype generation mechanism to associate intra-frame objects and interact inter-frame temporal relations.

Diversity Object +3

Variational Quantum Circuit Model for Knowledge Graphs Embedding

no code implementations19 Feb 2019 Yunpu Ma, Volker Tresp, Liming Zhao, Yuyi Wang

In this work, we propose the first quantum Ans\"atze for the statistical relational learning on knowledge graphs using parametric quantum circuits.

Knowledge Graph Embedding Knowledge Graphs +3

Geometry-Aware Scene Text Detection With Instance Transformation Network

no code implementations CVPR 2018 Fangfang Wang, Liming Zhao, Xi Li, Xinchao Wang, DaCheng Tao

Localizing text in the wild is challenging in the situations of complicated geometric layout of the targets like random orientation and large aspect ratio.

General Classification Multi-Task Learning +5

Deeply-Learned Part-Aligned Representations for Person Re-Identification

1 code implementation ICCV 2017 Liming Zhao, Xi Li, Jingdong Wang, Yueting Zhuang

In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras.

Person Re-Identification Triplet

Deep Convolutional Neural Networks with Merge-and-Run Mappings

4 code implementations23 Nov 2016 Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng

A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.

DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection

no code implementations19 Oct 2015 Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, Jingdong Wang

A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner.

Image Segmentation Multi-Task Learning +6

Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking

no code implementations4 Dec 2014 Liming Zhao, Xi Li, Jun Xiao, Fei Wu, Yueting Zhuang

As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework.

Metric Learning Object Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.