Search Results for author: Zhiyuan Fang

Found 16 papers, 9 papers with code

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

no code implementations25 Mar 2024 Yingshan Chang, Yasi Zhang, Zhiyuan Fang, YingNian Wu, Yonatan Bisk, Feng Gao

We hypothesize that the underlying phenomenological coverage has not been proportionally scaled up, leading to a skew of the presented phenomenon which harms generalization.

Relational Reasoning Text-to-Image Generation

End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation1 Jun 2023 Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

1 code implementation13 Nov 2022 Zekang Zhang, Guangyu Gao, Zhiyuan Fang, Jianbo Jiao, Yunchao Wei

Our MicroSeg is based on the assumption that background regions with strong objectness possibly belong to those concepts in the historical or future stages.

Class-Incremental Semantic Segmentation Continual Learning +1

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation CVPR 2022 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations ICCV 2021 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations21 Jun 2020 Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

HRDNet: High-resolution Detection Network for Small Objects

no code implementations13 Jun 2020 Ziming Liu, Guangyu Gao, Lin Sun, Zhiyuan Fang

By extracting various features from high to low resolutions, the MD-IPN is able to improve the performance of small object detection as well as maintaining the performance of middle and large objects.

Object object-detection +2

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

2 code implementations ECCV 2020 Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.

Attribute Contrastive Learning +2

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations28 May 2019 Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation CVPR 2019 Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Attribute counterfactual +4

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations1 May 2018 Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Range Loss for Deep Face Recognition With Long-Tailed Training Data

no code implementations ICCV 2017 Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Unlike these work, this paper investigated how long-tailed data impact the training of face CNNs and develop a novel loss function, called range loss, to effectively utilize the tailed data in training process.

Face Recognition

Range Loss for Deep Face Recognition with Long-tail

2 code implementations28 Nov 2016 Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities.

Face Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.