Search Results for author: Yuanhan Zhang

Found 21 papers, 16 papers with code

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation1 Apr 2024 Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation CVPR 2024 Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

OtterHD: A High-Resolution Multi-modality Model

1 code implementation7 Nov 2023 Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu

In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.

Visual Question Answering

FunQA: Towards Surprising Video Comprehension

1 code implementation26 Jun 2023 Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

Question Answering Text Generation +3

Learning without Forgetting for Vision-Language Models

no code implementations30 May 2023 Da-Wei Zhou, Yuanhan Zhang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information.

Class Incremental Learning Incremental Learning

Latent Distribution Adjusting for Face Anti-Spoofing

2 code implementations16 May 2023 Qinghong Sun, Zhenfei Yin, Yichao Wu, Yuanhan Zhang, Jing Shao

In this work, we propose a unified framework called Latent Distribution Adjusting (LDA) with properties of latent, discriminative, adaptive, generic to improve the robustness of the FAS model by adjusting complex data distribution with multiple prototypes.

Face Anti-Spoofing Prototype Selection

Otter: A Multi-Modal Model with In-Context Instruction Tuning

1 code implementation5 May 2023 Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, Ziwei Liu

Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks.

In-Context Learning Instruction Following +2

What Makes Good Examples for Visual In-Context Learning?

1 code implementation NeurIPS 2023 Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples.

In-Context Learning Retrieval

On-Device Domain Generalization

2 code implementations15 Sep 2022 Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu

Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.

Data Augmentation Domain Generalization +2

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

1 code implementation14 Jul 2022 Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.

Benchmarking Contrastive Learning +2

Neural Prompt Search

1 code implementation9 Jun 2022 Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer.

 Ranked #1 on Image Classification on OmniBenchmark (using extra training data)

Few-Shot Learning Image Classification +3

Robust Face Anti-Spoofing with Dual Probabilistic Modeling

no code implementations27 Apr 2022 Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu

In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.

Face Anti-Spoofing

Cannot find the paper you are looking for? You can Submit a new open access paper.