Search Results for author: Zhiyuan Fang

Found 16 papers, 9 papers with code

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

no code implementations • 25 Mar 2024 • Yingshan Chang, Yasi Zhang, Zhiyuan Fang, YingNian Wu, Yonatan Bisk, Feng Gao

We hypothesize that the underlying phenomenological coverage has not been proportionally scaled up, leading to a skew of the presented phenomenon which harms generalization.

Relational Reasoning Text-to-Image Generation

Paper
Add Code

End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

Paper
Code

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

1 code implementation • 13 Nov 2022 • Zekang Zhang, Guangyu Gao, Zhiyuan Fang, Jianbo Jiao, Yunchao Wei

Our MicroSeg is based on the assumption that background regions with strong objectness possibly belong to those concepts in the historical or future stages.

Class-Incremental Semantic Segmentation Continual Learning +1

Paper
Code

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

1 code implementation • 28 Apr 2022 • Arnav Chakravarthy, Zhiyuan Fang, Yezhou Yang

In videos that contain actions performed unintentionally, agents do not achieve their desired goals.

Action Understanding Video Captioning

Paper
Code

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Paper
Code

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Paper
Add Code

SEED: Self-supervised Distillation For Visual Representation

1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

This paper is concerned with self-supervised learning for small models.

Knowledge Distillation Self-Supervised Learning +1

Paper
Code

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

Paper
Add Code

HRDNet: High-resolution Detection Network for Small Objects

no code implementations • 13 Jun 2020 • Ziming Liu, Guangyu Gao, Lin Sun, Zhiyuan Fang

By extracting various features from high to low resolutions, the MD-IPN is able to improve the performance of small object detection as well as maintaining the performance of middle and large objects.

Object object-detection +2

Paper
Add Code

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

2 code implementations • ECCV 2020 • Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.

Ranked #18 on Text based Person Retrieval on CUHK-PEDES

Attribute Contrastive Learning +2

Paper
Code

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.

Question Answering Video Captioning +1

Paper
Code

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Paper
Add Code

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Attribute counterfactual +4

Paper
Code

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Paper
Add Code

Range Loss for Deep Face Recognition With Long-Tailed Training Data

no code implementations • ICCV 2017 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Unlike these work, this paper investigated how long-tailed data impact the training of face CNNs and develop a novel loss function, called range loss, to effectively utilize the tailed data in training process.

Face Recognition

Paper
Add Code

Range Loss for Deep Face Recognition with Long-tail

2 code implementations • 28 Nov 2016 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities.

Face Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.