no code implementations • 24 Jul 2017 • Shitao Tang, Yichen Pan
This paper presents a novel ensemble framework to extract highly discriminative feature representation of image and its application for group-level happpiness intensity prediction in wild.
4 code implementations • 13 Aug 2018 • Shitao Tang, Litong Feng, Zhangkui Kuang, Yimin Chen, Wei zhang
In order to train a high-performance shot transition detector, we contribute a new database ClipShots, which contains 128636 cut transitions and 38120 gradual transitions from 4039 online videos.
Ranked #3 on Camera shot boundary detection on ClipShots (using extra training data)
1 code implementation • 2 Jan 2019 • Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen
ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.
no code implementations • 25 Sep 2019 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.
1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.
1 code implementation • CVPR 2021 • Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, Ping Tan
We present a new method for scene agnostic camera localization using dense scene matching (DSM), where a cost volume is constructed between a query image and a scene.
1 code implementation • ICLR 2022 • Shitao Tang, Jiahui Zhang, Siyu Zhu, Ping Tan
Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency.
no code implementations • 26 Jul 2022 • Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan
Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.
1 code implementation • CVPR 2023 • Shitao Tang, Sicong Tang, Andrea Tagliasacchi, Ping Tan, Yasutaka Furukawa
State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene.
1 code implementation • NeurIPS 2023 • Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e. g., perspective crops from a panorama or multi-view images given depth maps and poses).
2 code implementations • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.
no code implementations • 20 Feb 2024 • Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan
MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time.