Search Results for author: Yifeng Geng

Found 23 papers, 14 papers with code

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

3,716

Paper
Code

FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation

1 code implementation • CVPR 2023 • Junjie He, Pengyu Li, Yifeng Geng, Xuansong Xie

In this paper, we show the strong potential of query-based models on efficient instance segmentation algorithm designs.

Real-time Instance Segmentation Segmentation +1

156

Paper
Code

Hypergraph Transformer for Skeleton-based Action Recognition

1 code implementation • 17 Nov 2022 • Yuxuan Zhou, Zhi-Qi Cheng, Chao Li, Yanwen Fang, Yifeng Geng, Xuansong Xie, Margret Keuper

Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections.

Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D 120

Action Recognition Skeleton Based Action Recognition

Paper
Code

Diversity Transfer Network for Few-Shot Learning

1 code implementation • 31 Dec 2019 • Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang

The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.

Few-Shot Learning

Paper
Code

Tracking with Human-Intent Reasoning

1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

The perception component then generates the tracking results based on the embeddings.

Language Modelling Object +4

Paper
Code

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Human pose estimation is a challenging task due to its structured data sequence nature.

Ranked #74 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Code

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

Paper
Code

ProContEXT: Exploring Progressive Context Transformer for Tracking

2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.

Object Visual Object Tracking

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning

1 code implementation • ICCV 2023 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Yifeng Geng, Xuansong Xie

Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.

Depth Estimation Panoptic Segmentation +1

Paper
Code

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Paper
Code

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie

By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.

Benchmarking Face Recognition +3

Paper
Code

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.

3D Human Pose Estimation

Paper
Code

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Paper
Code

Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition

no code implementations • 13 Jan 2021 • Mengting Chen, Xinggang Wang, Heng Luo, Yifeng Geng, Wenyu Liu

By applying the proposed feature matching block in different layers of the few-shot recognition network, multi-scale information among the compared images can be incorporated into the final cascaded matching feature, which boosts the recognition performance further and generalizes better by learning on relationships.

Few-Shot Learning

Paper
Add Code

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations • CVPR 2023 • Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

Paper
Add Code

PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly Interactive Extreme Motion Prediction

no code implementations • 6 Jun 2023 • Yanwen Fang, Jintai Chen, Peng-Tao Jiang, Chao Li, Yifeng Geng, Eddy K. F. LAM, Guodong Li

Multi-person motion prediction is a challenging task, especially for real-world scenarios of highly interacted persons.

Ranked #2 on Multi-Person Pose forecasting on Expi - common actions split

motion prediction Multi-Person Pose forecasting

Paper
Add Code

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).

Language Modelling Large Language Model

Paper
Add Code

FMViT: A multiple-frequency mixing Vision Transformer

no code implementations • 9 Nov 2023 • Wei Tan, Yifeng Geng, Xuansong Xie

On CoreML, FMViT outperforms MobileOne by 2. 6% in top-1 accuracy on the ImageNet dataset, with inference latency comparable to MobileOne (78. 5% vs. 75. 9%).

Paper
Add Code

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Paper
Add Code

Data-efficient Event Camera Pre-training via Disentangled Masked Modeling

no code implementations • 1 Mar 2024 • Zhenpeng Huang, Chao Li, Hao Chen, Yongjian Deng, Yifeng Geng, LiMin Wang

Our pre-training overcomes the limitations of previous methods, which either sacrifice temporal information by converting event sequences into 2D images for utilizing pre-trained image models or directly employ paired image data for knowledge distillation to enhance the learning of event streams.

Knowledge Distillation Self-Supervised Learning

Paper
Add Code

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

no code implementations • 7 Apr 2024 • Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, WangMeng Zuo

Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes.

Image Generation Marketing

Paper
Add Code

Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation

no code implementations • 7 Apr 2024 • Youze Xue, Binghui Chen, Yifeng Geng, Xuansong Xie, Jiansheng Chen, Hongbing Ma

Customized generative text-to-image models have the ability to produce images that closely resemble a given subject.

Image Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.