Recent advances in Large Language Models (LLMs) have stimulated a surge of research aimed at extending their applications to the visual domain.
In this work, we advocate for leveraging natural language supervision for the domain generalization task.
It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates.
Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations.
Ranked #7 on Text Spotting on Total-Text
Semi-structured query systems for document-oriented databases have many real applications.
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #10 on Document Image Classification on RVL-CDIP
However, the performance of contrastive learning fundamentally depends on quality and quantity of negative data pairs.
Ranked #55 on Domain Generalization on PACS
Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains.
Ranked #16 on Domain Generalization on TerraIncognita
Deep learning approaches to semantic parsing require a large amount of labeled data, but annotating complex logical forms is costly.
Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories.
Bridging the exponentially growing gap between the numbers of unlabeled and labeled protein sequences, several studies adopted semi-supervised learning for protein sequence modeling.
Parsing textual information embedded in images is important for various down- stream tasks.
OCR is inevitably linked to NLP since its final output is in text.
We present SQLova, the first Natural-language-to-SQL (NL2SQL) model to achieve human performance in WikiSQL dataset.
Since microRNAs (miRNAs) play a crucial role in post-transcriptional gene regulation, miRNA identification is one of the most essential problems in computational biology.
MicroRNAs (miRNAs) are short sequences of ribonucleic acids that control the expression of target messenger RNAs (mRNAs) by binding them.