Search Results for author: Shitao Tang

Found 12 papers, 8 papers with code

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

no code implementations20 Feb 2024 Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan

MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time.

3D Object Reconstruction 3D Reconstruction +2

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

2 code implementations18 Feb 2024 Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

1 code implementation NeurIPS 2023 Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e. g., perspective crops from a panorama or multi-view images given depth maps and poses).

Image Generation

NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

1 code implementation CVPR 2023 Shitao Tang, Sicong Tang, Andrea Tagliasacchi, Ping Tan, Yasutaka Furukawa

State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene.

Camera Localization Decoder +1

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

no code implementations26 Jul 2022 Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.

Image Retrieval Retrieval +1

QuadTree Attention for Vision Transformers

1 code implementation ICLR 2022 Shitao Tang, Jiahui Zhang, Siyu Zhu, Ping Tan

Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency.

object-detection Object Detection +2

Learning Camera Localization via Dense Scene Matching

1 code implementation CVPR 2021 Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, Ping Tan

We present a new method for scene agnostic camera localization using dense scene matching (DSM), where a cost volume is constructed between a query image and a scene.

Camera Localization

Channel Equilibrium Networks for Learning Deep Representation

1 code implementation ICML 2020 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.

Channel Equilibrium Networks

no code implementations25 Sep 2019 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.

Learning Efficient Detector with Semi-supervised Adaptive Distillation

1 code implementation2 Jan 2019 Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen

ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.

Image Classification Knowledge Distillation +1

Fast Video Shot Transition Localization with Deep Structured Models

4 code implementations13 Aug 2018 Shitao Tang, Litong Feng, Zhangkui Kuang, Yimin Chen, Wei zhang

In order to train a high-performance shot transition detector, we contribute a new database ClipShots, which contains 128636 cut transitions and 38120 gradual transitions from 4039 online videos.

Ranked #3 on Camera shot boundary detection on ClipShots (using extra training data)

Camera shot boundary detection

Feature Extraction via Recurrent Random Deep Ensembles and its Application in Gruop-level Happiness Estimation

no code implementations24 Jul 2017 Shitao Tang, Yichen Pan

This paper presents a novel ensemble framework to extract highly discriminative feature representation of image and its application for group-level happpiness intensity prediction in wild.

Cannot find the paper you are looking for? You can Submit a new open access paper.