Search Results for author: Xinxin Zhu

Found 16 papers, 7 papers with code

Deep Optimal Timing Strategies for Time Series

1 code implementation9 Oct 2023 Chen Pan, Fan Zhou, Xuanwei Hu, Xinxin Zhu, Wenxin Ning, Zi Zhuang, Siqiao Xue, James Zhang, Yunhua Hu

Deciding the best future execution time is a critical task in many business activities while evolving time series forecasting, and optimal timing strategy provides such a solution, which is driven by observed data.

Probabilistic Time Series Forecasting Time Series

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

no code implementations16 Jun 2023 Shuai Xiao, Chen Pan, Min Wang, Xinxin Zhu, Siqiao Xue, Jing Wang, Yunhua Hu, James Zhang, Jinghua Feng

To this end, we formulate the problem as a partially observable Markov decision problem (POMDP) and employ an environment correction algorithm based on the characteristics of the business.

Hierarchical Reinforcement Learning reinforcement-learning

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

1 code implementation NeurIPS 2023 Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA).

 Ranked #1 on Image Captioning on COCO Captions (SPICE metric, using extra training data)

Audio captioning Audio-Visual Captioning +14

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

1 code implementation17 Apr 2023 Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu

Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.

 Ranked #1 on Video Captioning on VATEX (using extra training data)

Audio captioning Audio-Video Question Answering (AVQA) +16

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

1 code implementation29 Mar 2023 Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu

In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.

Audio Generation Contrastive Learning +1

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

2 code implementations CVPR 2023 Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu

Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101.

Object Unconditional Video Generation +2

SLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies

no code implementations11 Feb 2023 Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, Shiyu Wang, James Zhang, Xinxin Zhu, Xuanwei Hu, Yunhua Hu, Yangfei Zheng, Lei Lei, Yun Hu

Moreover, unlike most previous reconciliation methods which either rely on strong assumptions or focus on coherent constraints only, we utilize deep neural optimization networks, which not only achieve coherency without any assumptions, but also allow more flexible and realistic constraints to achieve task-based targets, e. g., lower under-estimation penalty and meaningful decision-making loss to facilitate the subsequent downstream tasks.

Decision Making Multivariate Time Series Forecasting +1

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

2 code implementations1 Jul 2021 Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo, Zijia Zhao, Mingzhen Sun, Weining Wang, Hanqing Lu, Shiyu Zhou, Jiajun Zhang, Jinqiao Wang

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

Audio to Text Retrieval Cross-Modal Retrieval +3

CPTR: Full Transformer Network for Image Captioning

no code implementations26 Jan 2021 Wei Liu, Sihan Chen, Longteng Guo, Xinxin Zhu, Jing Liu

Besides, we provide detailed visualizations of the self-attention between patches in the encoder and the "words-to-patches" attention in the decoder thanks to the full Transformer architecture.

Image Captioning

Global-Local Propagation Network for RGB-D Semantic Segmentation

no code implementations26 Jan 2021 Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, Jing Liu

Depth information matters in RGB-D semantic segmentation task for providing additional geometric information to color images.

Scene Segmentation Segmentation

Fast Sequence Generation with Multi-Agent Reinforcement Learning

no code implementations24 Jan 2021 Longteng Guo, Jing Liu, Xinxin Zhu, Hanqing Lu

These models are autoregressive in that they generate each word by conditioning on previously generated words, which leads to heavy latency during inference.

Image Captioning Machine Translation +5

AutoCaption: Image Captioning with Neural Architecture Search

no code implementations16 Dec 2020 Xinxin Zhu, Weining Wang, Longteng Guo, Jing Liu

The whole process involves a visual understanding module and a language generation module, which brings more challenges to the design of deep neural networks than other tasks.

Image Captioning Neural Architecture Search +1

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

no code implementations10 May 2020 Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu

In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL).

Image Captioning Machine Translation +3

Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning

no code implementations17 Oct 2019 Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Jing Liu

This report describes our solution for the VATEX Captioning Challenge 2020, which requires generating descriptions for the videos in both English and Chinese languages.

Video Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.