Search Results for author: Faisal Ahmed

Found 10 papers, 5 papers with code

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations30 Oct 2023 Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Towards Zero-power 3D Imaging: VLC-assisted Passive ToF Sensing

no code implementations17 Jul 2023 Faisal Ahmed, Miguel Heredia Conde, Paula López Martínez

Passive Time-of-Flight (ToF) imaging can be enabled by optical wireless communication (OWC).

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation CVPR 2022 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Question Answering Video Captioning +2

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

1 code implementation23 Nov 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.

Image Captioning Language Modelling +5

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations ECCV 2020 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Image-text matching Language Modelling +12

UNITER: Learning UNiversal Image-TExt Representations

no code implementations25 Sep 2019 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Language Modelling +10

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

no code implementations15 Nov 2017 Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

Efficient Exploration Q-Learning +4

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

1 code implementation ACL 2017 Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng

In this paper, we address this limitation by replacing symbolic queries with an induced "soft" posterior distribution over the KB that indicates which entities the user is interested in.

reinforcement-learning Reinforcement Learning (RL) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.