Effectively performing object rearrangement is an essential skill for mobile manipulators, e. g., setting up a dinner table or organizing a desk.
In practice, the expensive cost of data annotation and the continuously increasing categories of new pills make it meaningful to develop a few-shot class-incremental pill recognition system.
New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services.
In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences.
The designed modality mixup module can be regarded as an augmentation, which mixes the acoustic and visual modalities from different videos.
However, it is a great challenge to learn a new domain incrementally without catastrophically forgetting previous knowledge.
To overcome the shortcomings, a deep Contrastive One-Class Anomaly detection method of time series (COCA) is proposed by authors, following the normality assumptions of CL and one-class classification.
The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules.
Object rearrangement is important for many applications but remains challenging, especially in confined spaces, such as shelves, where objects cannot be accessed from above and they block reachability to each other.
Specifically, supervised contrastive learning based on a memory bank is first used to train each new task so that the model can effectively learn the relation representation.
It is composed of two main modules: open intent detection and open intent discovery.
Joint entity and relation extraction is an essential task in information extraction, which aims to extract all relational triples from unstructured text.
Ranked #2 on Relation Extraction on SemEval-2010 Task 8
Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN.
DFSDP is extended to solve single-buffer, non-monotone instances, given a choice of an object and a buffer.
In this paper, we propose the Cross-Modal BERT (CM-BERT), which relies on the interaction of text and audio modality to fine-tune the pre-trained BERT model.
Ranked #3 on Multimodal Sentiment Analysis on MOSI