no code implementations • 16 Jan 2025 • Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, Fatih Porikli
Training with DiMA results in a 37% reduction in the L2 trajectory error and an 80% reduction in the collision rate of the vision-based planner, as well as a 44% trajectory error reduction in longtail scenarios.
no code implementations • 2 Dec 2024 • Farhad G. Zanjani, Hong Cai, Hanno Ackermann, Leila Mirvakhabova, Fatih Porikli
This paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images.
no code implementations • 16 Jul 2024 • Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli
We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models.
no code implementations • 13 Jun 2024 • Manish Kumar Singh, Rajeev Yasarla, Hong Cai, Mingu Lee, Fatih Porikli
In this way, we reduce the quadratic computation and memory costs as fewer tokens participate in self-attention while maintaining the features for all the image patches throughout the network, which allows it to be used for dense prediction tasks.
no code implementations • 11 Apr 2024 • Jamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli
Optical flow estimation is crucial to a variety of vision tasks.
no code implementations • CVPR 2024 • Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli
We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between.
no code implementations • 19 Mar 2024 • Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli
In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training.
Ranked #3 on
Monocular Depth Estimation
on KITTI Eigen split
no code implementations • CVPR 2024 • Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli
Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features.
no code implementations • 26 Feb 2024 • Farhad G. Zanjani, Hong Cai, Yinhao Zhu, Leyla Mirvakhabova, Fatih Porikli
This paper presents Neural Mesh Fusion (NMF), an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene.
no code implementations • 15 Jan 2024 • Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger
Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task.
no code implementations • 8 Aug 2023 • HUI ZHANG, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, José Ignacio Latorre, Ai Qun Liu
In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry.
no code implementations • IEEE/CVF International Conference on Computer Vision (ICCV) 2023 • Rajeev Yasarla, Hong Cai, Jisoo Jeong, Yunxiao Shi, Risheek Garrepalli, Fatih Porikli
We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation.
Ranked #14 on
Monocular Depth Estimation
on KITTI Eigen split
no code implementations • 6 Jun 2023 • Shubhankar Borse, Senthil Yogamani, Marvin Klingner, Varun Ravi, Hong Cai, Abdulaziz Almuzairee, Fatih Porikli
Bird's-eye-view (BEV) grid is a typical representation of the perception of road components, e. g., drivable area, in autonomous driving.
1 code implementation • NeurIPS 2023 • Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su
Due to their alignment with CLIP embeddings, our learned shape representations can also be integrated with off-the-shelf CLIP-based models for various applications, such as point cloud captioning and point cloud-conditioned image generation.
Ranked #5 on
Zero-shot 3D Point Cloud Classification
on OmniObject3D (Pretrained on ShapeNet)
(using extra training data)
1 code implementation • ICCV 2023 • Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi
Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene.
no code implementations • 6 Apr 2023 • Yunxiao Shi, Hong Cai, Amin Ansari, Fatih Porikli
the number of views and frames.
no code implementations • ICCV 2023 • Minghan Zhu, Shizhong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari, Fatih Porikli
In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation.
Ranked #2 on
4D Panoptic Segmentation
on SemanticKITTI
no code implementations • CVPR 2023 • Jisoo Jeong, Hong Cai, Risheek Garrepalli, Fatih Porikli
We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames.
no code implementations • CVPR 2023 • Shubhankar Borse, Debasmit Das, Hyojin Park, Hong Cai, Risheek Garrepalli, Fatih Porikli
Next, we use a conditional regenerator, which takes the redacted image and the dense predictions as inputs, and reconstructs the original image by filling in the missing structural information.
no code implementations • 24 Feb 2023 • Debasmit Das, Shubhankar Borse, Hyojin Park, Kambiz Azarian, Hong Cai, Risheek Garrepalli, Fatih Porikli
Test-time adaptive (TTA) semantic segmentation adapts a source pre-trained image semantic segmentation model to unlabeled batches of target domain test images, different from real-world, where samples arrive one-by-one in an online fashion.
2 code implementations • CVPR 2023 • Minghua Liu, Yinhao Zhu, Hong Cai, Shizhong Han, Zhan Ling, Fatih Porikli, Hao Su
Generalizable 3D part segmentation is important but challenging in vision and robotics.
no code implementations • 13 Oct 2022 • Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli
Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e. g., drivable area, in autonomous driving.
1 code implementation • 13 Oct 2022 • Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang
While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations.
1 code implementation • 17 Jun 2022 • Hanzhe Hu, Yinbo Chen, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang
As such, IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
no code implementations • CVPR 2022 • Shubhankar Borse, Hyojin Park, Hong Cai, Debasmit Das, Risheek Garrepalli, Fatih Porikli
A Panoptic Relational Attention (PRA) module is then applied to the encodings and the global feature map from the backbone.
no code implementations • 3 Mar 2022 • HUI ZHANG, Jonathan Wei Zhong Lau, Lingxiao Wan, Liang Shi, Hong Cai, Xianshu Luo, Patrick Lo, Chee-Kong Lee, Leong-Chuan Kwek, Ai Qun Liu
Machine learning methods have revolutionized the discovery process of new molecules and materials.
no code implementations • CVPR 2022 • Hyojin Park, Alan Yessenbayev, Tushar Singhal, Navin Kumar Adhikari, Yizhe Zhang, Shubhankar Mangesh Borse, Hong Cai, Nilesh Prasad Pandey, Fei Yin, Frank Mayer, Balaji Calidas, Fatih Porikli
Such a deployment scheme best utilizes the available processing power on the smartphone and enables real-time operation of our adaptive video segmentation algorithm.
no code implementations • 3 Nov 2021 • Shubhankar Borse, Hong Cai, Yizhe Zhang, Fatih Porikli
While deeply supervised networks are common in recent literature, they typically impose the same learning objective on all transitional layers despite their varying representation powers.
Ranked #4 on
Semantic Segmentation
on Cityscapes test
no code implementations • 24 Oct 2021 • Hong Cai, Janarbek Matai, Shubhankar Borse, Yizhe Zhang, Amin Ansari, Fatih Porikli
In order to enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can then be supervised by the teacher network.
1 code implementation • 24 Oct 2021 • Yizhe Zhang, Shubhankar Borse, Hong Cai, Fatih Porikli
Since inconsistency mainly arises from the model's uncertainty in its output, we propose an adaptation scheme where the model learns from its own segmentation decisions as it streams a video, which allows producing more confident and temporally consistent labeling for similarly-looking pixels across frames.
no code implementations • 24 Oct 2021 • Yizhe Zhang, Shubhankar Borse, Hong Cai, Ying Wang, Ning Bi, Xiaoyun Jiang, Fatih Porikli
More specifically, by measuring the perceptual consistency between the predicted segmentation and the available ground truth on a nearby frame and combining it with the segmentation confidence, we can accurately assess the classification correctness on each pixel.
1 code implementation • CVPR 2018 • Ekta Prashnani, Hong Cai, Yasamin Mostofi, Pradeep Sen
Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels.
Ranked #1 on
Video Quality Assessment
on MSU SR-QA Dataset