Search Results for author: Thomas H. Li

Found 30 papers, 21 papers with code

StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences

1 code implementation28 Nov 2023 Shangkun Sun, Jiaming Liu, Thomas H. Li, Huaxia Li, Guoqing Liu, Wei Gao

To address this issue, multi-frame optical flow methods leverage adjacent frames to mitigate the local ambiguity.

Optical Flow Estimation

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

1 code implementation25 Nov 2023 Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li

Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.

Video Understanding

Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction

1 code implementation NeurIPS 2023 Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan

Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images.

Image Super-Resolution Test-time Adaptation

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

no code implementations15 Aug 2023 Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction annotation data.

Navigate Robot Navigation +1

Learning Vision-and-Language Navigation from YouTube Videos

1 code implementation ICCV 2023 Kunyang Lin, Peihao Chen, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.

Navigate Vision and Language Navigation

Hard Sample Matters a Lot in Zero-Shot Quantization

1 code implementation CVPR 2023 Huantong Li, Xiangmiao Wu, Fanbing Lv, Daihai Liao, Thomas H. Li, Yonggang Zhang, Bo Han, Mingkui Tan

Nonetheless, we find that the synthetic samples constructed in existing ZSQ methods can be easily fitted by models.

Quantization

Detecting the open-world objects with the help of the Brain

1 code implementation21 Mar 2023 Shuailei Ma, Yuefeng Wang, Ying WEI, Peihao Chen, Zhixiang Ye, Jiaqi Fan, Enming Zhang, Thomas H. Li

We propose leveraging the VL as the ``Brain'' of the open-world detector by simply generating unknown labels.

Object object-detection +1

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

1 code implementation CVPR 2023 Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +3

CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection

no code implementations CVPR 2023 Shuailei Ma, Yuefeng Wang, Jiaqi Fan, Ying WEI, Thomas H. Li, Hongli Liu, Fanbing Lv

Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects.

object-detection Open World Object Detection

Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering

no code implementations CVPR 2023 Nan Zhang, Zhiyi Pan, Thomas H. Li, Wei Gao, Ge Li

Recently, self-attention networks achieve impressive performance in point cloud segmentation due to their superiority in modeling long-range dependencies.

Point Cloud Segmentation

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

1 code implementation14 Oct 2022 Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li, Mingkui Tan, Chuang Gan

To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.

Navigate Vision and Language Navigation

Learning Active Camera for Multi-Object Navigation

no code implementations14 Oct 2022 Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics.

Navigate Object

Frequency-Aware Self-Supervised Monocular Depth Estimation

1 code implementation11 Oct 2022 Xingyu Chen, Thomas H. Li, Ruonan Zhang, Ge Li

We present two versatile methods to generally enhance self-supervised monocular depth estimation (MDE) models.

Depth Prediction Monocular Depth Estimation +1

Deep Geometry Post-Processing for Decompressed Point Clouds

1 code implementation29 Apr 2022 Xiaoqing Fan, Ge Li, Dingquan Li, Yurui Ren, Wei Gao, Thomas H. Li

Point cloud compression plays a crucial role in reducing the huge cost of data storage and transmission.

Quantization

Neural Texture Extraction and Distribution for Controllable Person Image Synthesis

1 code implementation CVPR 2022 Yurui Ren, Xiaoqing Fan, Ge Li, Shan Liu, Thomas H. Li

Our model is trained to predict human images in arbitrary poses, which encourages it to extract disentangled and expressive neural textures representing the appearance of different semantic entities.

Image Generation

PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

1 code implementation ICCV 2021 Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, Shan Liu

The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.

Image Generation Neural Rendering

Combining Attention with Flow for Person Image Synthesis

no code implementations4 Aug 2021 Yurui Ren, Yubo Wu, Thomas H. Li, Shan Liu, Ge Li

Pose-guided person image synthesis aims to synthesize person images by transforming reference images into target poses.

Image Generation

Deep Image Spatial Transformation for Person Image Generation

2 code implementations CVPR 2020 Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, Ge Li

Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients.

Image Generation

Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds

no code implementations18 Apr 2019 Wei Yan, Yiting shao, Shan Liu, Thomas H. Li, Zhu Li, Ge Li

Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving.

Autonomous Driving Image Compression

Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector

no code implementations9 Jul 2018 Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li

Weakly supervised temporal action detection is a Herculean task in understanding untrimmed videos, since no supervisory signal except the video-level category label is available on training data.

Action Detection Temporal Localization

Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection

no code implementations14 May 2018 Chunbiao Zhu, Wen-Hao Zhang, Thomas H. Li, Ge Li

In this paper, we propose a novel salient object detection algorithm for RGB-D images using center-dark channel priors.

object-detection RGB Salient Object Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.