Search Results for author: Yiyang Zhou

Found 25 papers, 13 papers with code

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

1 code implementation5 Jul 2024 Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.

Hallucination Text-to-Image Generation

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

1 code implementation10 Jun 2024 Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, ZongYuan Ge, Gang Li, James Zou, Huaxiu Yao

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare.

Fairness

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

2 code implementations24 May 2024 Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, YuHang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data.

Hallucination Image Comprehension +2

Calibrated Self-Rewarding Vision Language Models

1 code implementation23 May 2024 Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao

In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input.

Hallucination Language Modelling +1

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

1 code implementation18 Feb 2024 Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations.

Hallucination Instruction Following +1

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

1 code implementation27 Nov 2023 Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

Adversarial Robustness Visual Question Answering (VQA) +1

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

1 code implementation6 Nov 2023 Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao

To bridge this gap, we introduce a new benchmark, namely, the Bias and Interference Challenges in Visual Language Models (Bingo).

Hallucination

Evaluation and Analysis of Hallucination in Large Vision-Language Models

1 code implementation29 Aug 2023 Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Hallucination Hallucination Evaluation

Semantic-Human: Neural Rendering of Humans from Monocular Video with Human Parsing

no code implementations19 Aug 2023 Jie Zhang, Pengcheng Shi, Zaiwang Gu, Yiyang Zhou, Zhi Wang

In this paper, we present Semantic-Human, a novel method that achieves both photorealistic details and viewpoint-consistent human parsing for the neural rendering of humans.

Denoising Human Parsing +2

Overlap Bias Matching is Necessary for Point Cloud Registration

no code implementations18 Aug 2023 Pengcheng Shi, Jie Zhang, Haozhe Cheng, Junyang Wang, Yiyang Zhou, Chenlin Zhao, Jihua Zhu

Specifically, we propose a plug-and-play Overlap Bias Matching Module (OBMM) comprising two integral components, overlap sampling module and bias prediction module.

Point Cloud Registration

DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

no code implementations16 May 2023 Pengcheng Shi, Haozhe Cheng, Xu Han, Yiyang Zhou, Jihua Zhu

To tackle these challenges, we propose an information interaction-based generative network for point cloud completion ($\mathbf{DualGenerator}$).

Point Cloud Completion

Contrastive Label Enhancement

no code implementations16 May 2023 Yifei Wang, Yiyang Zhou, Jihua Zhu, Xinyuan Liu, Wenbiao Yan, Zhiqiang Tian

Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity.

Contrastive Learning

Semantically Consistent Multi-view Representation Learning

no code implementations8 Mar 2023 Yiyang Zhou, Qinghai Zheng, Shunshun Bai, Jihua Zhu

In this work, we devote ourselves to the challenging task of Unsupervised Multi-view Representation Learning (UMRL), which requires learning a unified feature representation from multiple views in an unsupervised manner.

Contrastive Learning Representation Learning

Multi-view Semantic Consistency based Information Bottleneck for Clustering

no code implementations28 Feb 2023 Wenbiao Yan, Jihua Zhu, Yiyang Zhou, Yifei Wang, Qinghai Zheng

In this way, the learned semantic consistency from multi-view data can improve the information bottleneck to more exactly distinguish the consistent information and learn a unified feature representation with more discriminative consistent information for clustering.

Clustering

MCoCo: Multi-level Consistency Collaborative Multi-view Clustering

no code implementations26 Feb 2023 Yiyang Zhou, Qinghai Zheng, Wenbiao Yan, Yifei Wang, Pengcheng Shi, Jihua Zhu

Further, we designed a multi-level consistency collaboration strategy, which utilizes the consistent information of semantic space as a self-supervised signal to collaborate with the cluster assignments in feature space.

Clustering Contrastive Learning +2

Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects

no code implementations26 Sep 2022 Philip Jacobson, Yiyang Zhou, Wei Zhan, Masayoshi Tomizuka, Ming C. Wu

In this work, we propose a novel approach Center Feature Fusion (CFF), in which we leverage center-based detection networks in both the camera and LiDAR streams to identify relevant object locations.

Autonomous Vehicles Object +3

What Matters for 3D Scene Flow Network

1 code implementation19 Jul 2022 Guangming Wang, Yunzhe Hu, Zhe Liu, Yiyang Zhou, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

Our proposed model surpasses all existing methods by at least 38. 2% on FlyingThings3D dataset and 24. 7% on KITTI Scene Flow dataset for EPE3D metric.

Scene Flow Estimation

SST-Calib: Simultaneous Spatial-Temporal Parameter Calibration between LIDAR and Camera

no code implementations8 Jul 2022 Akio Kodaira, Yiyang Zhou, Pengwei Zang, Wei Zhan, Masayoshi Tomizuka

With information from multiple input modalities, sensor fusion-based algorithms usually out-perform their single-modality counterparts in robotics.

Optical Flow Estimation Segmentation +2

DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection

1 code implementation17 Mar 2022 Jinhyung Park, Chenfeng Xu, Yiyang Zhou, Masayoshi Tomizuka, Wei Zhan

While numerous 3D detection works leverage the complementary relationship between RGB images and point clouds, developments in the broader framework of semi-supervised object recognition remain uninfluenced by multi-modal fusion.

object-detection Object Detection +2

A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding

1 code implementation6 Mar 2021 Di Feng, Yiyang Zhou, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan

Detecting dynamic objects and predicting static road information such as drivable areas and ground heights are crucial for safe autonomous driving.

3D Object Detection Autonomous Driving +2

Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection

no code implementations18 Dec 2020 Di Feng, Zining Wang, Yiyang Zhou, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer, Masayoshi Tomizuka, Wei Zhan

As a result, an in-depth evaluation among different object detection methods remains challenging, and the training process of object detectors is sub-optimal, especially in probabilistic object detection.

Autonomous Driving Object +2

Inferring Spatial Uncertainty in Object Detection

no code implementations7 Mar 2020 Zining Wang, Di Feng, Yiyang Zhou, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer, Masayoshi Tomizuka, Wei Zhan

Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty.

Autonomous Driving Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.