Search Results for author: Xinxin Zhu

Found 16 papers, 7 papers with code

Deep Optimal Timing Strategies for Time Series

1 code implementation • 9 Oct 2023 • Chen Pan, Fan Zhou, Xuanwei Hu, Xinxin Zhu, Wenxin Ning, Zi Zhuang, Siqiao Xue, James Zhang, Yunhua Hu

Deciding the best future execution time is a critical task in many business activities while evolving time series forecasting, and optimal timing strategy provides such a solution, which is driven by observed data.

Probabilistic Time Series Forecasting Time Series

Paper
Code

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

no code implementations • 16 Jun 2023 • Shuai Xiao, Chen Pan, Min Wang, Xinxin Zhu, Siqiao Xue, Jing Wang, Yunhua Hu, James Zhang, Jinghua Feng

To this end, we formulate the problem as a partially observable Markov decision problem (POMDP) and employ an environment correction algorithm based on the characteristics of the business.

Hierarchical Reinforcement Learning reinforcement-learning

Paper
Add Code

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

1 code implementation • NeurIPS 2023 • Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).

Ranked #1 on Image Captioning on COCO Captions (SPICE metric, using extra training data)

Audio captioning Audio-Visual Captioning +14

175

Paper
Code

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

1 code implementation • 25 May 2023 • Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, Jing Liu

We show that only language-paired two-modality data is sufficient to connect all modalities.

Language Modelling Large Language Model

Paper
Code

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

1 code implementation • 17 Apr 2023 • Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu

Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.

Ranked #1 on Video Captioning on VATEX (using extra training data)

Audio captioning Audio-Video Question Answering (AVQA) +16

230

Paper
Code

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

1 code implementation • 29 Mar 2023 • Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu

In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.

Audio Generation Contrastive Learning +1

Paper
Code

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

2 code implementations • CVPR 2023 • Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu

Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.

Object Unconditional Video Generation +2

Paper
Code

SLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies

no code implementations • 11 Feb 2023 • Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, Shiyu Wang, James Zhang, Xinxin Zhu, Xuanwei Hu, Yunhua Hu, Yangfei Zheng, Lei Lei, Yun Hu

Moreover, unlike most previous reconciliation methods which either rely on strong assumptions or focus on coherent constraints only, we utilize deep neural optimization networks, which not only achieve coherency without any assumptions, but also allow more flexible and realistic constraints to achieve task-based targets, e. g., lower under-estimation penalty and meaningful decision-making loss to facilitate the subsequent downstream tasks.

Decision Making Multivariate Time Series Forecasting +1

Paper
Add Code

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

2 code implementations • 1 Jul 2021 • Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

Ranked #1 on Image Retrieval on Localized Narratives

Audio to Text Retrieval Cross-Modal Retrieval +3

334

Paper
Code

CPTR: Full Transformer Network for Image Captioning

no code implementations • 26 Jan 2021 • Wei Liu, Sihan Chen, Longteng Guo, Xinxin Zhu, Jing Liu

Besides, we provide detailed visualizations of the self-attention between patches in the encoder and the "words-to-patches" attention in the decoder thanks to the full Transformer architecture.

Image Captioning

Paper
Add Code

Global-Local Propagation Network for RGB-D Semantic Segmentation

no code implementations • 26 Jan 2021 • Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, Jing Liu

Depth information matters in RGB-D semantic segmentation task for providing additional geometric information to color images.

Scene Segmentation Segmentation

Paper
Add Code

Fast Sequence Generation with Multi-Agent Reinforcement Learning

no code implementations • 24 Jan 2021 • Longteng Guo, Jing Liu, Xinxin Zhu, Hanqing Lu

These models are autoregressive in that they generate each word by conditioning on previously generated words, which leads to heavy latency during inference.

Image Captioning Machine Translation +5

Paper
Add Code

AutoCaption: Image Captioning with Neural Architecture Search

no code implementations • 16 Dec 2020 • Xinxin Zhu, Weining Wang, Longteng Guo, Jing Liu

The whole process involves a visual understanding module and a language generation module, which brings more challenges to the design of deep neural networks than other tasks.

Image Captioning Neural Architecture Search +1

Paper
Add Code

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

no code implementations • 10 May 2020 • Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu

In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL).

Image Captioning Machine Translation +3

Paper
Add Code

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

no code implementations • CVPR 2020 • Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu

First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA.

Image Captioning Machine Translation +4

Paper
Add Code

Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning

no code implementations • 17 Oct 2019 • Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Jing Liu

This report describes our solution for the VATEX Captioning Challenge 2020, which requires generating descriptions for the videos in both English and Chinese languages.

Video Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.