Search Results for author: Yifeng Geng

Found 23 papers, 14 papers with code

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation6 Nov 2023 Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

Diversity Transfer Network for Few-Shot Learning

1 code implementation31 Dec 2019 Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang

The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.

Few-Shot Learning

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation19 May 2023 Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations27 Oct 2022 Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation30 Mar 2023 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition

no code implementations13 Jan 2021 Mengting Chen, Xinggang Wang, Heng Luo, Yifeng Geng, Wenyu Liu

By applying the proposed feature matching block in different layers of the few-shot recognition network, multi-scale information among the compared images can be incorporated into the final cascaded matching feature, which boosts the recognition performance further and generalizes better by learning on relationships.

Few-Shot Learning

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations CVPR 2023 Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

FMViT: A multiple-frequency mixing Vision Transformer

no code implementations9 Nov 2023 Wei Tan, Yifeng Geng, Xuansong Xie

On CoreML, FMViT outperforms MobileOne by 2. 6% in top-1 accuracy on the ImageNet dataset, with inference latency comparable to MobileOne (78. 5% vs. 75. 9%).

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations3 Jan 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Data-efficient Event Camera Pre-training via Disentangled Masked Modeling

no code implementations1 Mar 2024 Zhenpeng Huang, Chao Li, Hao Chen, Yongjian Deng, Yifeng Geng, LiMin Wang

Our pre-training overcomes the limitations of previous methods, which either sacrifice temporal information by converting event sequences into 2D images for utilizing pre-trained image models or directly employ paired image data for knowledge distillation to enhance the learning of event streams.

Knowledge Distillation Self-Supervised Learning

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

no code implementations7 Apr 2024 Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, WangMeng Zuo

Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes.

Image Generation Marketing

Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation

no code implementations7 Apr 2024 Youze Xue, Binghui Chen, Yifeng Geng, Xuansong Xie, Jiansheng Chen, Hongbing Ma

Customized generative text-to-image models have the ability to produce images that closely resemble a given subject.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.