README.md



Supported functions


Speech recognition
Speech synthesis
Speaker verification
Speaker identification


✔️
✔️
✔️
✔️


Spoken Language identification
Audio tagging
Voice activity detection
Keyword spotting


✔️
✔️
✔️
✔️


Supported platforms


Architecture
Android
iOS
Windows
macOS
linux


x64
✔️

✔️
✔️
✔️


x86
✔️

✔️


arm64
✔️
✔️
✔️
✔️
✔️


arm32
✔️


✔️


riscv64


✔️


Supported programming languages


C++
C
Python
C#
Java
JavaScript
Kotlin
Swift
Go
Dart


✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️


It also supports WebAssembly.


Introduction

This repository supports running the following functions locally


Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting


on the following platforms and operating systems:


x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
NodeJS
WebAssembly
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
etc


with the following APIs


C++, C, Python, Go, C#

Java, Kotlin, JavaScript
Swift
Dart


Links for pre-built Android APKs


Description
URL
中国用户


Streaming speech recognition
Address
点此


Text-to-speech
Address
点此


Voice activity detection (VAD)
Address
点此


VAD + non-streaming speech recognition
Address
点此


Two-pass speech recognition
Address
点此


Audio tagging
Address
点此


Audio tagging (WearOS)
Address
点此


Speaker identification
Address
点此


Spoken language identification
Address
点此


Keyword spotting
Address
点此


Links for pre-built Flutter APPs


Description
URL
中国用户


Streaming speech recognition
Address
点此


Links for pre-trained models


Description
URL


Speech recognition (speech to text, ASR)
Address


Text-to-speech (TTS)
Address


VAD
Address


Keyword spotting
Address


Audio tagging
Address


Speaker identification (Speaker ID)
Address


Spoken language identification (Language ID)
See multi-lingual Whisper ASR models from  Speech recognition


Punctuation
Address


Useful links


Documentation: https://k2-fsa.github.io/sherpa/onnx/

Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi


How to reach us

Please see
https://k2-fsa.github.io/sherpa/social-groups.html
for 新一代 Kaldi 微信交流群 and QQ 交流群.