Search Results for author: Faisal Ahmed

Found 11 papers, 5 papers with code

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

no code implementations • 17 Aug 2016 • Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

Efficient Exploration Q-Learning +4

Paper
Add Code

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

1 code implementation • ACL 2017 • Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng

In this paper, we address this limitation by replacing symbolic queries with an induced "soft" posterior distribution over the KB that indicates which entities the user is interested in.

reinforcement-learning Reinforcement Learning (RL) +2

186

Paper
Code

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

no code implementations • 15 Nov 2017 • Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

Efficient Exploration Q-Learning +4

Paper
Add Code

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations • ECCV 2020 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Ranked #3 on Visual Question Answering (VQA) on VCR (Q-A) test

Image-text matching Language Modelling +12

761

Paper
Code

UNITER: Learning UNiversal Image-TExt Representations

no code implementations • 25 Sep 2019 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Language Modelling +10

Paper
Add Code

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

1 code implementation • 23 Nov 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.

Image Captioning Language Modelling +5

Paper
Code

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation • CVPR 2022 • Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Caption Generation Question Answering +3

224

Paper
Code

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Ranked #22 on Visual Question Answering on MM-Vet

Multimodal Reasoning Visual Question Answering

903

Paper
Code

Towards Zero-power 3D Imaging: VLC-assisted Passive ToF Sensing

no code implementations • 17 Jul 2023 • Faisal Ahmed, Miguel Heredia Conde, Paula López Martínez

Passive Time-of-Flight (ToF) imaging can be enabled by optical wireless communication (OWC).

Paper
Add Code

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations • 30 Oct 2023 • Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Paper
Add Code

FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification

no code implementations • 11 Apr 2024 • Faisal Ahmed, Myungjin Lee, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin

Federated Learning (FL) has garnered significant interest recently due to its potential as an effective solution for tackling many challenges in diverse application scenarios, for example, data privacy in network edge traffic classification.

Federated Learning Multi-Task Learning +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.