Search Results for author: Yaser Yacoob

Found 10 papers, 5 papers with code

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

3 code implementations15 Nov 2023 Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu

Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

3 code implementations26 Jun 2023 Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang

To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.

Hallucination Visual Question Answering

COVID-VTS: Fact Extraction and Verification on Short Video Platforms

1 code implementation15 Feb 2023 Fuxiao Liu, Yaser Yacoob, Abhinav Shrivastava

We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation.

Fact Checking Fact Selection +1

One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

no code implementations15 Feb 2023 Trevine Oorloff, Yaser Yacoob

While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D priors, optical flow based warping as motion descriptors, off-the-shelf encoders, etc., which constrain their performance (e. g., inconsistent predictions, inability to capture fine facial details and accessories, poor generalization, artifacts).

Attribute Disentanglement +2

Robust One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

no code implementations ICCV 2023 Trevine Oorloff, Yaser Yacoob

Addressing these limitations, we propose a novel framework exploiting the implicit 3D prior and inherent latent properties of StyleGAN2 to facilitate one-shot face re-enactment at 1024x1024 (1) with zero dependencies on explicit structural priors, (2) accommodating attribute edits, and (3) robust to diverse facial expressions and head poses of the source frame.

Attribute

Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space

2 code implementations28 Mar 2022 Trevine Oorloff, Yaser Yacoob

To this end, we propose an end-to-end expressive face video encoding approach that facilitates data-efficient high-quality video re-synthesis by optimizing low-dimensional edits of a single Identity-latent.

Modeling Colors of Single Attribute Variations with Application to Food Appearance

no code implementations18 Dec 2015 Yaser Yacoob

This paper considers the intra-image color-space of an object or a scene when these are subject to a dominant single-source of variation.

Attribute Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.