Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.
PDF AbstractResults from the Paper
Ranked #1 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Spoken Language Understanding | Fluent Speech Commands | Finstreder (Conformer + AMT, character-based) | Accuracy (%) | 99.8 | # 1 | ||
Spoken Language Understanding | Fluent Speech Commands | Amazon Alexa | Accuracy (%) | 98.7 | # 16 | ||
Spoken Language Understanding | Fluent Speech Commands | Finstreder (Quartznet + AMT) | Accuracy (%) | 99.7 | # 3 | ||
Spoken Language Understanding | Fluent Speech Commands | Finstreder (Quartznet) | Accuracy (%) | 99.2 | # 12 | ||
Spoken Language Understanding | Fluent Speech Commands | Finstreder (Conformer) | Accuracy (%) | 99.5 | # 8 | ||
Slot Filling | SLURP | Finstreder (Conformer) | F1 | 0.395 | # 4 | ||
Slot Filling | SLURP | Finstreder (Quartznet) | F1 | 0.313 | # 5 | ||
Intent Classification | SLURP | Finstreder (Quartznet) | Accuracy (%) | 43.15 | # 5 | ||
Intent Classification | SLURP | Finstreder (Conformer) | Accuracy (%) | 53.11 | # 4 | ||
Spoken Language Understanding | Snips-SmartLights | Finstreder (Conformer) | Accuracy (%) | 88.0 | # 2 | ||
Spoken Language Understanding | Snips-SmartLights | Finstreder (Quartznet) | Accuracy (%) | 84.8 | # 4 | ||
Spoken Language Understanding | Snips-SmartLights | Finstreder (Conformer, character-based) | Accuracy (%) | 89.0 | # 1 | ||
Spoken Language Understanding | Snips-SmartSpeaker | Finstreder (Quartznet) | Accuracy-EN (%) | 77.6 | # 3 | ||
Accuracy-FR (%) | 77.8 | # 3 | |||||
Spoken Language Understanding | Snips-SmartSpeaker | Finstreder (Conformer, character-based) | Accuracy-EN (%) | 87.9 | # 1 | ||
Accuracy-FR (%) | 86.5 | # 1 | |||||
Spoken Language Understanding | Snips-SmartSpeaker | Finstreder (Conformer) | Accuracy-EN (%) | 80.4 | # 2 | ||
Accuracy-FR (%) | 78.3 | # 2 | |||||
Spoken Language Understanding | Timers and Such | Finstreder (Quartznet) | Accuracy (%) | 90.0 | # 2 | ||
Spoken Language Understanding | Timers and Such | Finstreder (Conformer) | Accuracy (%) | 95.4 | # 1 |