CSS: Combining Self-training and Self-supervised Learning for Few-shot Dialogue State Tracking

The unlabeled data of the DST task is incorporated into the self-training iterations, where the pseudo labels are predicted by a DST model trained on limited labeled data in advance.

ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

This paper proposes a fusion model named ScaleVLAD to gather multi-Scale representation from text, video, and audio with shared Vectors of Locally Aggregated Descriptors to improve unaligned multimodal sentiment analysis.

Control Image Captioning Spatially and Temporally

Moreover, the controllability and explainability of LoopCAG are validated by analyzing spatial and temporal sensitivity during the generation process.

GEM: A General Evaluation Benchmark for Multimodal Tasks

Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner.

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks.

DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction

This paper focuses on two related subtasks of aspect-based sentiment analysis, namely aspect term extraction and aspect sentiment classification, which we call aspect term-polarity co-extraction.

Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting

We cast the weather forecasting problem as an end-to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (NLE) loss function.

Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation

The key idea is to explicitly incorporate both representations gained separately from the bottom-up and top-down propagation on the given dependency syntactic tree.

