Search Results for author: Zhiyang Chen

Found 8 papers, 3 papers with code

Mitigating Hallucination in Visual Language Models with Visual Supervision

no code implementations27 Nov 2023 Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, Jinqiao Wang, Ming Tang

Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally.

Hallucination

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

1 code implementation24 Nov 2023 Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang

More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules.

Referring Expression Referring Expression Comprehension

Efficient Masked Autoencoders with Self-Consistency

no code implementations28 Feb 2023 Zhaowen Li, Yousong Zhu, Zhiyang Chen, Wei Li, Chaoyang Zhao, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

However, its high random mask ratio would result in two serious problems: 1) the data are not efficiently exploited, which brings inefficient pre-training (\eg, 1600 epochs for MAE $vs.$ 300 epochs for the supervised), and 2) the high uncertainty and inconsistency of the pre-trained model, \ie, the prediction of the same patch may be inconsistent under different mask rounds.

Language Modelling Masked Language Modeling +3

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

2 code implementations28 Sep 2022 Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang

Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.

Multi-Label Classification Object +2

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

no code implementations CVPR 2022 Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.

Image Classification Object +4

DPT: Deformable Patch-based Transformer for Visual Recognition

1 code implementation30 Jul 2021 Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, Ming Tang

To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches.

Image Classification object-detection +2

MST: Masked Self-Supervised Transformer for Visual Representation

no code implementations NeurIPS 2021 Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.

Language Modelling Masked Language Modeling +3

Adversarially Robust Neural Networks via Optimal Control: Bridging Robustness with Lyapunov Stability

no code implementations ICLR 2020 Zhiyang Chen, Hang Su

From this viewpoint, training neural nets is equivalent to finding an optimal control of the discrete dynamical system, which allows one to utilize methods of successive approximations, an optimal control algorithm based on Pontryagin's maximum principle, to train neural nets.

Adversarial Robustness

Cannot find the paper you are looking for? You can Submit a new open access paper.