Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

29 Jun 2022  ยท  Daniel Bermuth, Alexander Poeppel, Wolfgang Reif ยท

In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Spoken Language Understanding Fluent Speech Commands Finstreder (Conformer + AMT, character-based) Accuracy (%) 99.8 # 1
Spoken Language Understanding Fluent Speech Commands Amazon Alexa Accuracy (%) 98.7 # 16
Spoken Language Understanding Fluent Speech Commands Finstreder (Quartznet + AMT) Accuracy (%) 99.7 # 3
Spoken Language Understanding Fluent Speech Commands Finstreder (Quartznet) Accuracy (%) 99.2 # 12
Spoken Language Understanding Fluent Speech Commands Finstreder (Conformer) Accuracy (%) 99.5 # 8
Slot Filling SLURP Finstreder (Conformer) F1 0.395 # 4
Slot Filling SLURP Finstreder (Quartznet) F1 0.313 # 5
Intent Classification SLURP Finstreder (Quartznet) Accuracy (%) 43.15 # 5
Intent Classification SLURP Finstreder (Conformer) Accuracy (%) 53.11 # 4
Spoken Language Understanding Snips-SmartLights Finstreder (Conformer) Accuracy (%) 88.0 # 2
Spoken Language Understanding Snips-SmartLights Finstreder (Quartznet) Accuracy (%) 84.8 # 4
Spoken Language Understanding Snips-SmartLights Finstreder (Conformer, character-based) Accuracy (%) 89.0 # 1
Spoken Language Understanding Snips-SmartSpeaker Finstreder (Quartznet) Accuracy-EN (%) 77.6 # 3
Accuracy-FR (%) 77.8 # 3
Spoken Language Understanding Snips-SmartSpeaker Finstreder (Conformer, character-based) Accuracy-EN (%) 87.9 # 1
Accuracy-FR (%) 86.5 # 1
Spoken Language Understanding Snips-SmartSpeaker Finstreder (Conformer) Accuracy-EN (%) 80.4 # 2
Accuracy-FR (%) 78.3 # 2
Spoken Language Understanding Timers and Such Finstreder (Quartznet) Accuracy (%) 90.0 # 2
Spoken Language Understanding Timers and Such Finstreder (Conformer) Accuracy (%) 95.4 # 1

Methods


No methods listed for this paper. Add relevant methods here