Then we propose a Scalable Query strategy to allow LLMs to choose true neighbors of the selected samples from multiple candidate samples.
Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education.
The core idea in this line of work has been the strict requirement that the team of experts assigned to complete a given task should contain a superset of the skills required by the task.
To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results.
Ranked #4 on Scene Text Recognition on ICDAR 2003