Search Results for author: Jinhui Ye

Found 8 papers, 5 papers with code

Re-thinking Temporal Search for Long-Form Video Understanding

1 code implementation3 Apr 2025 Jinhui Ye, Zihan Wang, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li

Specifically, under an inference budget of 32 frames, T* improves GPT-4o's performance from 50. 5% to 53. 1% and LLaVA-OneVision-72B's performance from 56. 5% to 62. 4% on LongVideoBench XL subset.

Computational Efficiency Form +2

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

no code implementations17 Mar 2025 Weiyu Guo, Ziyang Chen, Shaoguang Wang, Jianxiang He, Yijie Xu, Jinhui Ye, Ying Sun, Hui Xiong

Understanding long video content is a complex endeavor that often relies on densely sampled frame captions or end-to-end feature selectors, yet these techniques commonly overlook the logical relationships between textual queries and visual elements.

Attribute MME +4

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

1 code implementation3 Mar 2025 Lu Dai, Yijie Xu, Jinhui Ye, Hao liu, Hui Xiong

Large Language Models (LLMs) have demonstrated improved generation performance by incorporating externally retrieved knowledge, a process known as retrieval-augmented generation (RAG).

RAG Retrieval

Improving Gloss-free Sign Language Translation by Reducing Representation Density

1 code implementation23 May 2024 Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT.

Contrastive Learning Gloss-free Sign Language Translation +2

GeoDeformer: Geometric Deformable Transformer for Action Recognition

no code implementations29 Nov 2023 Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang

Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.

Action Recognition

Spatial-Temporal Alignment Network for Action Recognition

no code implementations19 Aug 2023 Jinhui Ye, Junwei Liang

This paper studies introducing viewpoint invariant feature representations in existing action recognition architecture.

Action Recognition

Cross-modality Data Augmentation for End-to-End Sign Language Translation

1 code implementation18 May 2023 Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Hui Xiong

To tackle these challenges, we propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation (i. e. video-to-text) by exploiting pseudo gloss-text pairs from the sign gloss translation model.

Data Augmentation Knowledge Distillation +3

Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation

1 code implementation13 Oct 2022 Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu

In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data.

Language Modelling Text Generation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.