Search Results for author: Zehan Wang

Found 16 papers, 9 papers with code

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

no code implementations • 23 Dec 2023 • Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao

However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Add Code

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

no code implementations • 21 Dec 2023 • Haifeng Huang, Yang Zhao, Zehan Wang, Yan Xia, Zhou Zhao

Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not.

Unsupervised Domain Adaptation Video Grounding

Paper
Add Code

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

2 code implementations • 13 Dec 2023 • Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao

These tokens capture the object's attributes and spatial relationships with surrounding objects in the 3D scene.

Attribute Object +1

Paper
Code

Extending Multi-modal Contrastive Representations

1 code implementation • 13 Oct 2023 • Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao

Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.

3D Object Classification Representation Learning +1

Paper
Code

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes

1 code implementation • 17 Aug 2023 • Zehan Wang, Haifeng Huang, Yang Zhao, Ziang Zhang, Zhou Zhao

This paper presents Chat-3D, which combines the 3D visual perceptual ability of pre-trained 3D representations and the impressive reasoning and conversation capabilities of advanced LLMs to achieve the first universal dialogue systems for 3D scenes.

Language Modelling Large Language Model +1

Paper
Code

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

no code implementations • 25 Jul 2023 • Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.

Object Position +3

Paper
Add Code

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

1 code implementation • ICCV 2023 • Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

To accomplish this, we design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.

Object Semantic Similarity +3

Paper
Code

Connecting Multi-modal Contrastive Representations

no code implementations • NeurIPS 2023 • Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR).

3D Point Cloud Classification counterfactual +4

Paper
Add Code

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

2 code implementations • ICCV 2023 • Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Lip Reading Machine Translation +4

157

Paper
Code

DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction

no code implementations • 19 Mar 2020 • Zehan Wang, Sihong Zhou, Yuyao Huang, Wei Tian

One of the important and inherent factors is the multi-modality of vehicle motion.

Multiple-choice Trajectory Prediction

Paper
Add Code

Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks

no code implementations • 16 Nov 2017 • Joost van Amersfoort, Wenzhe Shi, Alejandro Acosta, Francisco Massa, Johannes Totz, Zehan Wang, Jose Caballero

To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses.

Generative Adversarial Network

Paper
Add Code

Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize

3 code implementations • 10 Jul 2017 • Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, Wenzhe Shi

Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization.

155

Paper
Code

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

no code implementations • CVPR 2017 • Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, Wenzhe Shi

Convolutional neural networks have enabled accurate image super-resolution in real-time.

Ranked #11 on Video Super-Resolution on MSU Video Upscalers: Quality Enhancement (VMAF metric)

Motion Compensation Video Super-Resolution

Paper
Add Code

Is the deconvolution layer the same as a convolutional layer?

6 code implementations • 22 Sep 2016 • Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang

In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented.

3,197

Paper
Code

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

39 code implementations • CVPR 2016 • Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

This means that the super-resolution (SR) operation is performed in HR space.

Ranked #1 on Video Super-Resolution on Xiph HD - 4x upscaling

Image Super-Resolution Video Super-Resolution

11,840

Paper
Code

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

138 code implementations • CVPR 2017 • Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

Ranked #3 on Image Super-Resolution on VggFace2 - 8x upscaling

Generative Adversarial Network Image Super-Resolution

61,324

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.