README.md 14.0 KB

原文件审查历史永久链接



Introduction

Note: You need Node >= 16.

This repo contains examples for NodeJS.
It uses node-addon-api to wrap
sherpa-onnx for NodeJS and it supports multiple threads.

Note: ../nodejs-examples uses WebAssembly to wrap
sherpa-onnx for NodeJS and it does not support multiple threads.

Before you continue, please first run

npm install

# For macOS x64
export DYLD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-darwin-x64:$DYLD_LIBRARY_PATH

# For macOS arm64
export DYLD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-darwin-arm64:$DYLD_LIBRARY_PATH

# For Linux x64
export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH

# For Linux arm64, e.g., Raspberry Pi 4
export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-arm64:$LD_LIBRARY_PATH


Examples

The following tables list the examples in this folder.


Add punctuations to text


File
Description


./test_punctuation.js
Add punctuations to input text using CT transformer. It supports both Chinese and English.


Voice activity detection (VAD)


File
Description


./test_vad_microphone.js
VAD with a microphone. It uses silero-vad


Speaker identification


File
Description


 ./test_speaker_identification.js
Speaker identification from a file


Spoken language identification


File
Description


./test_vad_spoken_language_identification_microphone.js
Spoken language identification from a microphone using a multi-lingual Whisper model


Audio tagging


File
Description


./test_audio_tagging_zipformer.js
Audio tagging with a Zipformer model


./test_audio_tagging_ced.js
Audio tagging with a CED model


Keyword spotting


File
Description


./test_keyword_spotter_transducer.js
Keyword spotting from a file using a Zipformer model


./test_keyword_spotter_transducer_microphone.js
Keyword spotting from a microphone using a Zipformer model


Streaming speech-to-text from files


File
Description


./test_asr_streaming_transducer.js
Streaming speech recognition from a file using a Zipformer transducer model


./test_asr_streaming_ctc.js
Streaming speech recognition from a file using a Zipformer CTC model with greedy search


./test_asr_streaming_ctc_hlg.js
Streaming speech recognition from a file using a Zipformer CTC model with HLG decoding


./test_asr_streaming_paraformer.js
Streaming speech recognition from a file using a Paraformer model


Streaming speech-to-text from a microphone


File
Description


./test_asr_streaming_transducer_microphone.js
Streaming speech recognition from a microphone using a Zipformer transducer model


./test_asr_streaming_ctc_microphone.js
Streaming speech recognition from a microphone using a Zipformer CTC model with greedy search


./test_asr_streaming_ctc_hlg_microphone.js
Streaming speech recognition from a microphone using a Zipformer CTC model with HLG decoding


./test_asr_streaming_paraformer_microphone.js
Streaming speech recognition from a microphone using a Paraformer model


Non-Streaming speech-to-text from files


File
Description


./test_asr_non_streaming_transducer.js
Non-streaming speech recognition from a file with a Zipformer transducer model


./test_asr_non_streaming_whisper.js
Non-streaming speech recognition from a file using Whisper


./test_asr_non_streaming_nemo_ctc.js
Non-streaming speech recognition from a file using a NeMo CTC model with greedy search


./test_asr_non_streaming_paraformer.js
Non-streaming speech recognition from a file using Paraformer


Non-Streaming speech-to-text from a microphone with VAD


File
Description


./test_vad_asr_non_streaming_transducer_microphone.js
VAD + Non-streaming speech recognition from a microphone using a Zipformer transducer model


./test_vad_asr_non_streaming_whisper_microphone.js
VAD + Non-streaming speech recognition from a microphone using Whisper


./test_vad_asr_non_streaming_nemo_ctc_microphone.js
VAD + Non-streaming speech recognition from a microphone using a NeMo CTC model with greedy search


./test_vad_asr_non_streaming_paraformer_microphone.js
VAD + Non-streaming speech recognition from a microphone using Paraformer


Text-to-speech


File
Description


./test_tts_non_streaming_vits_piper_en.js
Text-to-speech with a piper English model


./test_tts_non_streaming_vits_coqui_de.js
Text-to-speech with a coqui German model


./test_tts_non_streaming_vits_zh_ll.js
Text-to-speech with a Chinese model using cppjieba


./test_tts_non_streaming_vits_zh_aishell3.js
Text-to-speech with a Chinese TTS model


Voice Activity detection (VAD)

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx


# To run the test with a microphone, you need to install the package naudiodon2
npm install naudiodon2

node ./test_vad_microphone.js


Audio tagging with zipformer

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
tar xvf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
rm sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2

node ./test_audio_tagging_zipformer.js


Audio tagging with CED

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
tar xvf sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
rm sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2

node ./test_audio_tagging_ced.js


Streaming speech recognition with Zipformer transducer

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
rm sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2

node ./test_asr_streaming_transducer.js

# To run the test with a microphone, you need to install the package naudiodon2
npm install naudiodon2

node ./test_asr_streaming_transducer_microphone.js


Streaming speech recognition with Zipformer CTC

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
rm sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2

node ./test_asr_streaming_ctc.js

# To decode with HLG.fst
node ./test_asr_streaming_ctc_hlg.js

# To run the test with a microphone, you need to install the package naudiodon2
npm install naudiodon2

node ./test_asr_streaming_ctc_microphone.js
node ./test_asr_streaming_ctc_hlg_microphone.js


Streaming speech recognition with Paraformer

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
tar xvf sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
rm sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2

node ./test_asr_streaming_paraformer.js

# To run the test with a microphone, you need to install the package naudiodon2
npm install naudiodon2

node ./test_asr_streaming_paraformer_microphone.js


Non-streaming speech recognition with Zipformer transducer

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-en-2023-04-01.tar.bz2
tar xvf sherpa-onnx-zipformer-en-2023-04-01.tar.bz2
rm sherpa-onnx-zipformer-en-2023-04-01.tar.bz2

node ./test_asr_non_streaming_transducer.js

# To run VAD + non-streaming ASR with transudcer using a microphone
npm install naudiodon2
node ./test_vad_asr_non_streaming_transducer_microphone.js


Non-streaming speech recognition with Whisper

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
tar xvf sherpa-onnx-whisper-tiny.en.tar.bz2
rm sherpa-onnx-whisper-tiny.en.tar.bz2

node ./test_asr_non_streaming_whisper.js

# To run VAD + non-streaming ASR with Paraformer using a microphone
npm install naudiodon2
node ./test_vad_asr_non_streaming_whisper_microphone.js


Non-streaming speech recognition with NeMo CTC models

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2
tar xvf sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2
rm sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2

node ./test_asr_non_streaming_nemo_ctc.js

# To run VAD + non-streaming ASR with Paraformer using a microphone
npm install naudiodon2
node ./test_vad_asr_non_streaming_nemo_ctc_microphone.js


Non-streaming speech recognition with Paraformer

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
rm sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

node ./test_asr_non_streaming_paraformer.js

# To run VAD + non-streaming ASR with Paraformer using a microphone
npm install naudiodon2
node ./test_vad_asr_non_streaming_paraformer_microphone.js


Text-to-speech with piper VITS models (TTS)

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-cori-medium.tar.bz2
tar xvf vits-piper-en_GB-cori-medium.tar.bz2
rm vits-piper-en_GB-cori-medium.tar.bz2

node ./test_tts_non_streaming_vits_piper_en.js


Text-to-speech with piper Coqui-ai/TTS models (TTS)

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-coqui-de-css10.tar.bz2
tar xvf vits-coqui-de-css10.tar.bz2
rm vits-coqui-de-css10.tar.bz2

node ./test_tts_non_streaming_vits_coqui_de.js


Text-to-speech with vits Chinese models (1/2)

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
tar xvf sherpa-onnx-vits-zh-ll.tar.bz2
rm sherpa-onnx-vits-zh-ll.tar.bz2

node ./test_tts_non_streaming_vits_zh_ll.js


Text-to-speech with vits Chinese models (2/2)

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-icefall-zh-aishell3.tar.bz2
tar xvf vits-icefall-zh-aishell3.tar.bz2
rm vits-icefall-zh-aishell3.tar.bz2

node ./test_tts_non_streaming_vits_zh_aishell3.js


Spoken language identification with Whisper multi-lingual models

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
tar xvf sherpa-onnx-whisper-tiny.tar.bz2
rm sherpa-onnx-whisper-tiny.tar.bz2

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/spoken-language-identification-test-wavs.tar.bz2
tar xvf spoken-language-identification-test-wavs.tar.bz2
rm spoken-language-identification-test-wavs.tar.bz2

node ./test_spoken_language_identification.js

# To run VAD + spoken language identification using a microphone
npm install naudiodon2
node ./test_vad_spoken_language_identification_microphone.js


Speaker identification

You can find more models at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx

git clone https://github.com/csukuangfj/sr-data

node ./test_speaker_identification.js


Add punctuations

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/punctuation-models/sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
tar xvf sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
rm sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2

node ./test_punctuation.js


Keyword spotting

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

node ./test_keyword_spotter_transducer.js

# To run keyword spotting using a microphone
npm install naudiodon2
node ./test_keyword_spotter_transducer_microphone.js