Topic models have been prevailing for many years on discovering latent semantics while modeling long documents.
Secondly, we explicitly cast contrastive topic modeling as a gradient-based multi-objective optimization problem, with the goal of achieving a Pareto stationary solution that balances the trade-off between the ELBO and the contrastive objective.
Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity.
Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization.
Temporal Language Grounding seeks to localize video moments that semantically correspond to a natural language query.
Topic models have been proposed for decades with various applications and recently refreshed by the neural variational inference.
Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning.
Multimodal Review Helpfulness Prediction (MRHP) aims to rank product reviews based on predicted helpfulness scores and has been widely applied in e-commerce via presenting customers with useful reviews.
In this work, we propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks by tuning the language models with unlabeled data, called self-supervised tuning.
Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method.
To overcome the data sparsity issue in short text topic modeling, existing methods commonly rely on data augmentation or the data characteristic of short texts to introduce more word co-occurrence information.
To overcome the aforementioned issues, we propose Multimodal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem, concentrating on mutual information between input modalities to explicitly elaborate cross-modal relations.
Moreover, a list of training datasets and downstream tasks is supplied to further polish the perspective into V\&L pretraining.