Search Results for author: Hong Cai

Found 28 papers, 7 papers with code

SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

no code implementations • 11 Apr 2024 • Jamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli

Optical flow estimation is crucial to a variety of vision tasks.

Optical Flow Estimation regression

Paper
Add Code

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

no code implementations • 26 Mar 2024 • Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between.

Data Augmentation Optical Flow Estimation

Paper
Add Code

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

no code implementations • 19 Mar 2024 • Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training.

Ranked #2 on Monocular Depth Estimation on KITTI Eigen split

Future prediction Monocular Depth Estimation

Paper
Add Code

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

no code implementations • 18 Mar 2024 • Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli

Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features.

Depth Completion

Paper
Add Code

Neural Mesh Fusion: Unsupervised 3D Planar Surface Understanding

no code implementations • 26 Feb 2024 • Farhad G. Zanjani, Hong Cai, Yinhao Zhu, Leyla Mirvakhabova, Fatih Porikli

This paper presents Neural Mesh Fusion (NMF), an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene.

Neural Rendering

Paper
Add Code

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

no code implementations • 15 Jan 2024 • Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger

Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task.

3D Generation Text to 3D

Paper
Add Code

Efficient option pricing with unary-based photonic computing chip and generative adversarial learning

no code implementations • 8 Aug 2023 • HUI ZHANG, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, José Ignacio Latorre, Ai Qun Liu

In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry.

Generative Adversarial Network

Paper
Add Code

MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation

no code implementations • IEEE/CVF International Conference on Computer Vision (ICCV) 2023 • Rajeev Yasarla, Hong Cai, Jisoo Jeong, Yunxiao Shi, Risheek Garrepalli, Fatih Porikli

We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation.

Ranked #12 on Monocular Depth Estimation on KITTI Eigen split

Depth Prediction Monocular Depth Estimation

Paper
Add Code

X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation

no code implementations • 6 Jun 2023 • Shubhankar Borse, Senthil Yogamani, Marvin Klingner, Varun Ravi, Hong Cai, Abdulaziz Almuzairee, Fatih Porikli

Bird's-eye-view (BEV) grid is a typical representation of the perception of road components, e. g., drivable area, in autonomous driving.

Autonomous Driving Segmentation

Paper
Add Code

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

1 code implementation • NeurIPS 2023 • Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

Due to their alignment with CLIP embeddings, our learned shape representations can also be integrated with off-the-shelf CLIP-based models for various applications, such as point cloud captioning and point cloud-conditioned image generation.

Ranked #5 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet40 (using extra training data)

3D Classification 3D Shape Representation +4

204

Paper
Code

Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation

1 code implementation • ICCV 2023 • Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi

Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene.

Inverse Rendering Lighting Estimation

Paper
Code

EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation

no code implementations • 6 Apr 2023 • Yunxiao Shi, Hong Cai, Amin Ansari, Fatih Porikli

the number of views and frames.

Autonomous Driving Depth Estimation

Paper
Add Code

4D Panoptic Segmentation as Invariant and Equivariant Field Prediction

no code implementations • ICCV 2023 • Minghan Zhu, Shizhong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari, Fatih Porikli

In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation.

4D Panoptic Segmentation Autonomous Driving +2

Paper
Add Code

DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

no code implementations • CVPR 2023 • Jisoo Jeong, Hong Cai, Risheek Garrepalli, Fatih Porikli

We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames.

Data Augmentation Optical Flow Estimation

Paper
Add Code

DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction

no code implementations • CVPR 2023 • Shubhankar Borse, Debasmit Das, Hyojin Park, Hong Cai, Risheek Garrepalli, Fatih Porikli

Next, we use a conditional regenerator, which takes the redacted image and the dense predictions as inputs, and reconstructs the original image by filling in the missing structural information.

Depth Estimation

Paper
Add Code

TransAdapt: A Transformative Framework for Online Test Time Adaptive Semantic Segmentation

no code implementations • 24 Feb 2023 • Debasmit Das, Shubhankar Borse, Hyojin Park, Kambiz Azarian, Hong Cai, Risheek Garrepalli, Fatih Porikli

Test-time adaptive (TTA) semantic segmentation adapts a source pre-trained image semantic segmentation model to unlabeled batches of target domain test images, different from real-world, where samples arrive one-by-one in an online fashion.

Segmentation Semantic Segmentation +1

Paper
Add Code

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

2 code implementations • CVPR 2023 • Minghua Liu, Yinhao Zhu, Hong Cai, Shizhong Han, Zhan Ling, Fatih Porikli, Hao Su

Generalizable 3D part segmentation is important but challenging in vision and robotics.

3D Part Segmentation Language Modelling +1

Paper
Code

X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

no code implementations • 13 Oct 2022 • Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli

Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e. g., drivable area, in autonomous driving.

Autonomous Driving Segmentation

Paper
Add Code

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

1 code implementation • 13 Oct 2022 • Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations.

6D Pose Estimation 6D Pose Estimation using RGB +2

Paper
Code

Learning Implicit Feature Alignment Function for Semantic Segmentation

1 code implementation • 17 Jun 2022 • Hanzhe Hu, Yinbo Chen, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

As such, IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.

Segmentation Semantic Segmentation

Paper
Code

Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

no code implementations • CVPR 2022 • Shubhankar Borse, Hyojin Park, Hong Cai, Debasmit Das, Risheek Garrepalli, Fatih Porikli

A Panoptic Relational Attention (PRA) module is then applied to the encodings and the global feature map from the backbone.

Panoptic Segmentation Segmentation

Paper
Add Code

A photonic chip-based machine learning approach for the prediction of molecular properties

no code implementations • 3 Mar 2022 • HUI ZHANG, Jonathan Wei Zhong Lau, Lingxiao Wan, Liang Shi, Hong Cai, Xianshu Luo, Patrick Lo, Chee-Kong Lee, Leong-Chuan Kwek, Ai Qun Liu

Machine learning methods have revolutionized the discovery process of new molecules and materials.

BIG-bench Machine Learning Drug Discovery +1

Paper
Add Code

Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device

no code implementations • CVPR 2022 • Hyojin Park, Alan Yessenbayev, Tushar Singhal, Navin Kumar Adhikari, Yizhe Zhang, Shubhankar Mangesh Borse, Hong Cai, Nilesh Prasad Pandey, Fei Yin, Frank Mayer, Balaji Calidas, Fatih Porikli

Such a deployment scheme best utilizes the available processing power on the smartphone and enables real-time operation of our adaptive video segmentation algorithm.

Segmentation Semantic Segmentation +2

Paper
Add Code

HS3: Learning with Proper Task Complexity in Hierarchically Supervised Semantic Segmentation

no code implementations • 3 Nov 2021 • Shubhankar Borse, Hong Cai, Yizhe Zhang, Fatih Porikli

While deeply supervised networks are common in recent literature, they typically impose the same learning objective on all transitional layers despite their varying representation powers.

Ranked #4 on Semantic Segmentation on Cityscapes test (using extra training data)

Segmentation Semantic Segmentation

Paper
Add Code

Perceptual Consistency in Video Segmentation

no code implementations • 24 Oct 2021 • Yizhe Zhang, Shubhankar Borse, Hong Cai, Ying Wang, Ning Bi, Xiaoyun Jiang, Fatih Porikli

More specifically, by measuring the perceptual consistency between the predicted segmentation and the available ground truth on a nearby frame and combining it with the segmentation confidence, we can accurately assess the classification correctness on each pixel.

Segmentation Semantic Segmentation +2

Paper
Add Code

AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation

1 code implementation • 24 Oct 2021 • Yizhe Zhang, Shubhankar Borse, Hong Cai, Fatih Porikli

Since inconsistency mainly arises from the model's uncertainty in its output, we propose an adaptation scheme where the model learns from its own segmentation decisions as it streams a video, which allows producing more confident and temporally consistent labeling for similarly-looking pixels across frames.

Optical Flow Estimation Segmentation +4

318

Paper
Code

X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation

no code implementations • 24 Oct 2021 • Hong Cai, Janarbek Matai, Shubhankar Borse, Yizhe Zhang, Amin Ansari, Fatih Porikli

In order to enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can then be supervised by the teacher network.

Ranked #17 on Monocular Depth Estimation on KITTI Eigen split unsupervised

Knowledge Distillation Monocular Depth Estimation +2

Paper
Add Code

PieAPP: Perceptual Image-Error Assessment through Pairwise Preference

1 code implementation • CVPR 2018 • Ekta Prashnani, Hong Cai, Yasamin Mostofi, Pradeep Sen

Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels.

Ranked #1 on Video Quality Assessment on MSU SR-QA Dataset

Video Quality Assessment

184

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.