Search Results for author: Zhiwei Jia

Found 15 papers, 7 papers with code

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

no code implementations • 28 May 2023 • Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

Image ad understanding is a crucial task with wide real-world applications.

Paper
Add Code

Chain-of-Thought Predictive Control

1 code implementation • 3 Apr 2023 • Zhiwei Jia, Fangchen Liu, Vineet Thumuluri, Linghao Chen, Zhiao Huang, Hao Su

We study generalizable policy learning from demonstrations for complex low-level control tasks (e. g., contact-rich object manipulations).

Imitation Learning

Paper
Code

MetaCLUE: Towards Comprehensive Visual Metaphors Research

no code implementations • CVPR 2023 • Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor.

Image Generation Question Answering +1

Paper
Add Code

Improving Policy Optimization with Generalist-Specialist Learning

1 code implementation • 26 Jun 2022 • Zhiwei Jia, Xuanlin Li, Zhan Ling, Shuang Liu, Yiran Wu, Hao Su

Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations.

Imitation Learning

Paper
Code

Learning to Act with Affordance-Aware Multimodal Neural SLAM

1 code implementation • 24 Jan 2022 • Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme

With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.

Efficient Exploration Test unseen

Paper
Code

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

no code implementations • 16 Nov 2021 • Yue Tao, Zhiwei Jia, Runze Ma, Shugong Xu

We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module.

Inductive Bias Scene Text Recognition

Paper
Add Code

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

1 code implementation • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme

However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.

Indoor Scene Synthesis Scene Generation

Paper
Code

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

no code implementations • 13 Aug 2021 • Zhiwei Jia, Shugong Xu, Shiyi Mu, Yue Tao, Shan Cao, Zhiyong Chen

In this paper, we propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition, taking advantage of refined text images input and robust feature representation.

Image Restoration Scene Text Recognition

Paper
Add Code

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

3 code implementations • 30 Jul 2021 • Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao Su

Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator.

403

Paper
Code

Tracking Based Semi-Automatic Annotation for Scene Text Videos

no code implementations • 29 Mar 2021 • Jiajun Zhu, Xiufeng Jiang, Zhiwei Jia, Shugong Xu, Shan Cao

Moreover, a paired low-quality scene text video dataset named Text-RBL is proposed, consisting of raw videos, blurry videos, and low-resolution videos, labeled by the proposed convenient semi-automatic labeling strategy.

Scene Text Detection text annotation +1