Search Results for author: Dongmei Fu

Found 15 papers, 5 papers with code

Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

no code implementations15 Dec 2023 Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang

In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes.

Knowledge Graphs Property Prediction

PixelLM: Pixel Reasoning with Large Multimodal Model

no code implementations4 Dec 2023 Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin

PixelLM excels across various pixel-level image reasoning and understanding tasks, outperforming well-established methods in multiple benchmarks, including MUSE, single- and multi-referring segmentation.

Segmentation

MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field

no code implementations24 Sep 2023 Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu

3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency.

Incremental Learning Style Transfer

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Retrieval +6

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

no code implementations22 Nov 2022 Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu

Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.

3D Human Pose Estimation 3D Pose Estimation

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

no code implementations6 Aug 2022 Zhongwei Qiu, Qiansheng Yang, Jian Wang, Dongmei Fu

In particular, we firstly formulate video frames as a series of instance-guided tokens and each token is in charge of predicting the 3D pose of a human instance.

Ranked #10 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

2D Pose Estimation 3D Multi-Person Pose Estimation +1

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

1 code implementation5 Aug 2022 Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu

First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.

Video Enhancement Video Super-Resolution

Contrastive Masked Autoencoders are Stronger Vision Learners

1 code implementation27 Jul 2022 Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng

The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.

Contrastive Learning Image Classification +3

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations CVPR 2021 Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Representation Learning Retrieval +3

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

1 code implementation2 Apr 2020 Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu

We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks.

Image-text matching Language Modelling +7

Geodesic Clustering in Deep Generative Models

no code implementations13 Sep 2018 Tao Yang, Georgios Arvanitidis, Dongmei Fu, Xiaogang Li, Søren Hauberg

Deep generative models are tremendously successful in learning low-dimensional latent representations that well-describe the data.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.