Grounded Language-Image Pre-training

1 code implementation7 Dec 2021 Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

 Ranked #1 on Phrase Grounding on Flickr30k Entities Test (using extra training data)

Object Detection Phrase Grounding

Forecasting battery capacity and power degradation with multi-task learning

no code implementations29 Nov 2021 Weihan Li, Haotian Zhang, Bruis van Vlijmen, Philipp Dechent, Dirk Uwe Sauer

In this paper, we propose a data-driven prognostics framework to predict both capacity and power fade simultaneously with multi-task learning.

Multi-Task Learning

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

1 code implementation29 Nov 2021 Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, Xiao Liu

We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images.

3D Reconstruction

MegLoc: A Robust and Accurate Visual Localization Pipeline

no code implementations25 Nov 2021 Shuxue Peng, Zihang He, Haotian Zhang, Ran Yan, Chuting Wang, Qingtian Zhu, Xiao Liu

In this paper, we present a visual localization pipeline, namely MegLoc, for robust and accurate 6-DoF pose estimation under varying scenarios, including indoor and outdoor scenes, different time across a day, different seasons across a year, and even across years.

Autonomous Driving Pose Estimation +1

Method Towards CVPR 2021 Image Matching Challenge

no code implementations10 Aug 2021 Xiaopeng Bi, Yu Chen, Xinyang Liu, Dehao Zhang, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

This report describes Megvii-3D team's approach towards CVPR 2021 Image Matching Workshop.

Method Towards CVPR 2021 SimLocMatch Challenge

no code implementations10 Aug 2021 Xiaopeng Bi, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

This report describes Megvii-3D team's approach towards SimLocMatch Challenge @ CVPR 2021 Image Matching Workshop.

Amortized Variational Deep Q Network

1 code implementation3 Nov 2020 Haotian Zhang, Yuhao Wang, Jianyong Sun, Zongben Xu

Efficient exploration is one of the most important issues in deep reinforcement learning.

Efficient Exploration OpenAI Gym +1

Recurrent Inference in Text Editing

1 code implementation Findings of the Association for Computational Linguistics 2020 Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong

In neural text editing, prevalent sequence-to-sequence based approaches directly map the unedited text either to the edited text or the editing operations, in which the performance is degraded by the limited source text encoding and long, varying decoding steps.

IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

no code implementations24 Jun 2020 Jiarui Cai, Yizhou Wang, Haotian Zhang, Hung-Min Hsu, Chengqian Ma, Jenq-Neng Hwang

Meanwhile, the spatial attention, which focuses on the foreground within the bounding boxes, is generated from the given instance masks and applied to the extracted embedding features.

Multi-Object Tracking Multiple Object Tracking

Learning to be Global Optimizer

no code implementations10 Mar 2020 Haotian Zhang, Jianyong Sun, Zongben Xu

This paper proposes to learn a two-phase (including a minimization phase and an escaping phase) global optimization algorithm for smooth non-convex functions.

Image Classification

On Hyper-parameter Tuning for Stochastic Optimization Algorithms

no code implementations4 Mar 2020 Haotian Zhang, Jianyong Sun, Zongben Xu

This paper proposes the first-ever algorithmic framework for tuning hyper-parameters of stochastic optimization algorithm based on reinforcement learning.

reinforcement-learning Stochastic Optimization

Adaptive Structural Hyper-Parameter Configuration by Q-Learning

no code implementations2 Mar 2020 Haotian Zhang, Jianyong Sun, Zongben Xu

Tuning hyper-parameters for evolutionary algorithms is an important issue in computational intelligence.

Q-Learning reinforcement-learning

Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization

no code implementations ICLR 2020 Haotian Zhang, Jian Sun, Zongben Xu

Bayesian optimization is an effective tool to optimize black-box functions and popular for hyper-parameter tuning in machine learning.

Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval

no code implementations IJCNLP 2019 Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin

This paper applies BERT to ad hoc document retrieval on news articles, which requires addressing two challenges: relevance judgments in existing test collections are typically provided only at the document level, and documents often exceed the length that BERT was designed to handle.

Applying BERT to Document Retrieval with Birch

no code implementations IJCNLP 2019 Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, Jimmy Lin

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections.

Information Retrieval

Eye in the Sky: Drone-Based Object Tracking and 3D Localization

no code implementations18 Oct 2019 Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang

Drones, or general UAVs, equipped with a single camera have been widely deployed to a broad range of applications, such as aerial photography, fast goods delivery and most importantly, surveillance.

drone-based object tracking Multi-Object Tracking +1

GetNet: Get Target Area for Image Pairing

no code implementations8 Oct 2019 Henry H. Yu, Jiang Liu, Hao Sun, Ziwen Wang, Haotian Zhang

Image pairing is an important research task in the field of computer vision.

Person Re-Identification

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

1 code implementation7 Oct 2019 Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian

Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film.

An Internal Learning Approach to Video Inpainting

1 code implementation ICCV 2019 Haotian Zhang, Long Mai, Ning Xu, Zhaowen Wang, John Collomosse, Hailin Jin

We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent 'Deep Image Prior' (DIP) that exploits convolutional network architectures to enforce plausible texture in static images.

Optical Flow Estimation Video Inpainting

Simple Applications of BERT for Ad Hoc Document Retrieval

2 code implementations26 Mar 2019 Wei Yang, Haotian Zhang, Jimmy Lin

Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval.

Ad-Hoc Information Retrieval Question Answering

TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes

1 code implementation CVPR 2019 Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, Leonidas Guibas

We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e. g., color texture maps).

3D Semantic Segmentation

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

1 code implementation18 Nov 2018 Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang

Multi-object tracking (MOT) is an important and practical task related to both surveillance systems and moving camera applications, such as autonomous driving and robotic vision.

Autonomous Driving Multi-Object Tracking

Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval

no code implementations23 Mar 2018 Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, Mark D. Smucker

This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback.

Active Learning Information Retrieval

Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams

no code implementations25 Jul 2017 Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, Salman Mohammed, Jimmy Lin

To our knowledge, we are the first to integrate lexical and temporal signals in an end-to-end neural network architecture, in which existing neural ranking models are used to generate query-document similarity vectors that feed into a bidirectional LSTM layer for temporal modeling.

Density Estimation Document Ranking

Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering

no code implementations25 Jul 2017 Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, Jimmy Lin

Most work on natural language question answering today focuses on answer selection: given a candidate list of sentences, determine which contains the answer.

Answer Selection

