Search Results for author: Junhua Mao

Found 13 papers, 5 papers with code

Generation and Comprehension of Unambiguous Object Descriptions

1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Image Captioning Object +1

157

Paper
Code

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

1 code implementation • ICCV 2015 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.

Image Captioning Novel Concepts +1

109

Paper
Code

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

2 code implementations • 20 Dec 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions.

8k Image Captioning +1

109

Paper
Code

CNN-RNN: A Unified Framework for Multi-label Image Classification

1 code implementation • CVPR 2016 • Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu

While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image.

Classification General Classification +2

Paper
Code

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

1 code implementation • NeurIPS 2015 • Haoyuan Gao, Junhua Mao, Jie zhou, Zhiheng Huang, Lei Wang, Wei Xu

The quality of the generated answers of our mQA model on this dataset is evaluated by human judges through a Turing Test.

Question Answering Sentence +1

Paper
Code

Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images

no code implementations • NeurIPS 2016 • Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille

In this paper, we focus on training and evaluating effective word embeddings with both text and visual information.

Image Retrieval Sentence +1

Paper
Add Code

Attention Correctness in Neural Image Captioning

no code implementations • 31 May 2016 • Chenxi Liu, Junhua Mao, Fei Sha, Alan Yuille

Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision.

Image Captioning

Paper
Add Code

Explain Images with Multimodal Recurrent Neural Networks

no code implementations • 4 Oct 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images.

8k Retrieval +1

Paper
Add Code

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question

no code implementations • NeurIPS 2015 • Haoyuan Gao, Junhua Mao, Jie zhou, Zhiheng Huang, Lei Wang, Wei Xu

The quality of the generated answers of our mQA model on this dataset is evaluated by human judges through a Turing Test.

Question Answering Sentence +1

Paper
Add Code

Learning From Weakly Supervised Data by The Expectation Loss SVM (e-SVM) algorithm

no code implementations • NeurIPS 2014 • Jun Zhu, Junhua Mao, Alan L. Yuille

We propose a novel learning algorithm called \emph{expectation loss SVM} (e-SVM) that is devoted to the problems where only the ``positiveness" instead of a binary label of each training sample is available.

object-detection Object Detection +1

Paper
Add Code

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

no code implementations • CVPR 2020 • Zhishuai Zhang, Jiyang Gao, Junhua Mao, Yukai Liu, Dragomir Anguelov, Cong-Cong Li

For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80. 73 and trajectory prediction average displacement error (ADE) of 33. 67cm for pedestrians, which establish the state-of-the-art for both tasks.

Autonomous Driving object-detection +3

Paper
Add Code

Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

no code implementations • 22 Dec 2021 • Jingxiao Zheng, Xinwei Shi, Alexander Gorban, Junhua Mao, Yang song, Charles R. Qi, Ting Liu, Visesh Chari, Andre Cornman, Yin Zhou, CongCong Li, Dragomir Anguelov

3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy.

3D Human Pose Estimation Autonomous Driving

Paper
Add Code

Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

no code implementations • 1 Jun 2023 • Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene Ie, CongCong Li

Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas.

Action Recognition Autonomous Vehicles +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.