Fangjun Kuang
Committed by GitHub

Add Python API for KittenTTS. (#2466)

... ... @@ -328,6 +328,25 @@ log "Offline TTS test"
# test waves are saved in ./tts
mkdir ./tts
log "test kitten tts"
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2
python3 ./python-api-examples/offline-tts.py \
--debug=1 \
--kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
--kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
--kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
--kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
--num-threads=2 \
--sid=0 \
--output-filename="./tts/kitten-0.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
rm -rf kitten-nano-en-v0_1-fp16
log "kokoro-multi-lang-v1_0 test"
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-multi-lang-v1_0.tar.bz2
... ...
... ... @@ -11,7 +11,7 @@ while the model is still generating.
Usage:
Example (1/7)
Example (1/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-amy-low.tar.bz2
tar xf vits-piper-en_US-amy-low.tar.bz2
... ... @@ -23,7 +23,7 @@ python3 ./python-api-examples/offline-tts-play.py \
--output-filename=./generated.wav \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (2/7)
Example (2/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-aishell3.tar.bz2
tar xvf vits-zh-aishell3.tar.bz2
... ... @@ -37,7 +37,7 @@ python3 ./python-api-examples/offline-tts-play.py \
--output-filename=./liubei-21.wav \
"勿以恶小而为之,勿以善小而不为。惟贤惟德,能服于人。122334"
Example (3/7)
Example (3/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
tar xvf sherpa-onnx-vits-zh-ll.tar.bz2
... ... @@ -53,7 +53,7 @@ python3 ./python-api-examples/offline-tts-play.py \
--output-filename=./test-2.wav \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。2024年5月11号,拨打110或者18920240511。123456块钱。"
Example (4/7)
Example (4/8)
curl -O -SL https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
... ... @@ -71,7 +71,7 @@ python3 ./python-api-examples/offline-tts-play.py \
--output-filename=./test-matcha.wav \
"某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。"
Example (5/7)
Example (5/8)
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2
tar xvf matcha-icefall-en_US-ljspeech.tar.bz2
... ... @@ -88,7 +88,7 @@ python3 ./python-api-examples/offline-tts-play.py \
--num-threads=2 \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (6/7)
Example (6/8)
(This version of kokoro supports only English)
... ... @@ -105,9 +105,9 @@ python3 ./python-api-examples/offline-tts.py \
--num-threads=2 \
--sid=10 \
--output-filename="./kokoro-10.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be a statesman, a businessman, an official, or a scholar."
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (7/7)
Example (7/8)
(This version of kokoro supports English, Chinese, etc.)
... ... @@ -128,6 +128,23 @@ python3 ./python-api-examples/offline-tts-play.py \
--output-filename="./kokoro-18-zh-en.wav" \
"中英文语音合成测试。This is generated by next generation Kaldi using Kokoro without Misaki. 你觉得中英文说的如何呢?"
Example (8/8)
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2
python3 ./python-api-examples/offline-tts-play.py \
--debug=1 \
--kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
--kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
--kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
--kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
--num-threads=2 \
--sid=0 \
--output-filename="./kitten-0.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
You can find more models at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
... ... @@ -285,6 +302,36 @@ def add_kokoro_args(parser):
)
def add_kitten_args(parser):
parser.add_argument(
"--kitten-model",
type=str,
default="",
help="Path to model.onnx for kitten",
)
parser.add_argument(
"--kitten-voices",
type=str,
default="",
help="Path to voices.bin for kitten",
)
parser.add_argument(
"--kitten-tokens",
type=str,
default="",
help="Path to tokens.txt for kitten",
)
parser.add_argument(
"--kitten-data-dir",
type=str,
default="",
help="Path to the dict directory of espeak-ng.",
)
def get_args():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
... ... @@ -293,6 +340,7 @@ def get_args():
add_vits_args(parser)
add_matcha_args(parser)
add_kokoro_args(parser)
add_kitten_args(parser)
parser.add_argument(
"--tts-rule-fsts",
... ... @@ -499,6 +547,12 @@ def main():
dict_dir=args.kokoro_dict_dir,
lexicon=args.kokoro_lexicon,
),
kitten=sherpa_onnx.OfflineTtsKittenModelConfig(
model=args.kitten_model,
voices=args.kitten_voices,
tokens=args.kitten_tokens,
data_dir=args.kitten_data_dir,
),
provider=args.provider,
debug=args.debug,
num_threads=args.num_threads,
... ...
... ... @@ -12,7 +12,7 @@ generated audio.
Usage:
Example (1/7)
Example (1/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-amy-low.tar.bz2
tar xf vits-piper-en_US-amy-low.tar.bz2
... ... @@ -24,7 +24,7 @@ python3 ./python-api-examples/offline-tts.py \
--output-filename=./generated.wav \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (2/7)
Example (2/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-icefall-zh-aishell3.tar.bz2
tar xvf vits-icefall-zh-aishell3.tar.bz2
... ... @@ -38,7 +38,7 @@ python3 ./python-api-examples/offline-tts.py \
--output-filename=./liubei-21.wav \
"勿以恶小而为之,勿以善小而不为。惟贤惟德,能服于人。122334"
Example (3/7)
Example (3/8)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
tar xvf sherpa-onnx-vits-zh-ll.tar.bz2
... ... @@ -54,7 +54,7 @@ python3 ./python-api-examples/offline-tts.py \
--output-filename=./test-2.wav \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。2024年5月11号,拨打110或者18920240511。123456块钱。"
Example (4/7)
Example (4/8)
curl -O -SL https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
... ... @@ -72,7 +72,7 @@ python3 ./python-api-examples/offline-tts.py \
--output-filename=./test-matcha.wav \
"某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。"
Example (5/7)
Example (5/8)
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2
tar xvf matcha-icefall-en_US-ljspeech.tar.bz2
... ... @@ -89,7 +89,7 @@ python3 ./python-api-examples/offline-tts.py \
--num-threads=2 \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (6/7)
Example (6/8)
(This version of kokoro supports only English)
... ... @@ -106,9 +106,9 @@ python3 ./python-api-examples/offline-tts.py \
--num-threads=2 \
--sid=10 \
--output-filename="./kokoro-10.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be a statesman, a businessman, an official, or a scholar."
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
Example (7/7)
Example (7/8)
(This version of kokoro supports English, Chinese, etc.)
... ... @@ -129,6 +129,23 @@ python3 ./python-api-examples/offline-tts.py \
--output-filename="./kokoro-18-zh-en.wav" \
"中英文语音合成测试。This is generated by next generation Kaldi using Kokoro without Misaki. 你觉得中英文说的如何呢?"
Example (8/8)
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2
python3 ./python-api-examples/offline-tts.py \
--debug=1 \
--kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
--kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
--kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
--kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
--num-threads=2 \
--sid=0 \
--output-filename="./kitten-0.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar."
You can find more models at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
... ... @@ -272,6 +289,36 @@ def add_kokoro_args(parser):
)
def add_kitten_args(parser):
parser.add_argument(
"--kitten-model",
type=str,
default="",
help="Path to model.onnx for kitten",
)
parser.add_argument(
"--kitten-voices",
type=str,
default="",
help="Path to voices.bin for kitten",
)
parser.add_argument(
"--kitten-tokens",
type=str,
default="",
help="Path to tokens.txt for kitten",
)
parser.add_argument(
"--kitten-data-dir",
type=str,
default="",
help="Path to the dict directory of espeak-ng.",
)
def get_args():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
... ... @@ -280,6 +327,7 @@ def get_args():
add_vits_args(parser)
add_matcha_args(parser)
add_kokoro_args(parser)
add_kitten_args(parser)
parser.add_argument(
"--tts-rule-fsts",
... ... @@ -382,6 +430,12 @@ def main():
dict_dir=args.kokoro_dict_dir,
lexicon=args.kokoro_lexicon,
),
kitten=sherpa_onnx.OfflineTtsKittenModelConfig(
model=args.kitten_model,
voices=args.kitten_voices,
tokens=args.kitten_tokens,
data_dir=args.kitten_data_dir,
),
provider=args.provider,
debug=args.debug,
num_threads=args.num_threads,
... ...
... ... @@ -44,6 +44,7 @@ from _sherpa_onnx import (
OfflineTransducerModelConfig,
OfflineTts,
OfflineTtsConfig,
OfflineTtsKittenModelConfig,
OfflineTtsKokoroModelConfig,
OfflineTtsMatchaModelConfig,
OfflineTtsModelConfig,
... ...