Toggle navigation
Toggle navigation
此项目
正在载入...
Sign in
xuning
/
sherpaonnx
转到一个项目
Toggle navigation
项目
群组
代码片段
帮助
Toggle navigation pinning
Project
Activity
Repository
Pipelines
Graphs
Issues
0
Merge Requests
0
Wiki
Network
Create a new issue
Builds
Commits
继续操作前请注册或者登录。
Authored by
Wei Kang
2024-07-16 19:41:31 +0800
Browse Files
Options
Browse Files
Download
Email Patches
Plain Diff
Committed by
GitHub
2024-07-16 19:41:31 +0800
Commit
5b1fa8750ffbf779604d3c618d1aca3ab09bf02a
5b1fa875
1 parent
960eb752
Fix hotwords OOV log (#1139)
隐藏空白字符变更
内嵌
并排对比
正在显示
2 个修改的文件
包含
15 行增加
和
11 行删除
sherpa-onnx/csrc/utils.cc
sherpa-onnx/python/sherpa_onnx/utils.py
sherpa-onnx/csrc/utils.cc
查看文件 @
5b1fa87
...
...
@@ -62,9 +62,9 @@ static bool EncodeBase(const std::vector<std::string> &lines,
break
;
default
:
SHERPA_ONNX_LOGE
(
"Cannot find ID for token %s at line: %s. (Hint: words on "
"the same line are separated by spaces)"
,
word
.
c_str
(),
line
.
c_str
());
"Cannot find ID for token %s at line: %s. (Hint: Check the "
"tokens.txt see if %s in it)"
,
word
.
c_str
(),
line
.
c_str
(),
word
.
c_str
());
has_oov
=
true
;
break
;
}
...
...
sherpa-onnx/python/sherpa_onnx/utils.py
查看文件 @
5b1fa87
...
...
@@ -4,6 +4,7 @@ import re
from
pathlib
import
Path
from
typing
import
List
,
Optional
,
Union
def
text2token
(
texts
:
List
[
str
],
tokens
:
str
,
...
...
@@ -33,20 +34,20 @@ def text2token(
is True, or it is a list of list of tokens.
"""
try
:
import
sentencepiece
as
spm
import
sentencepiece
as
spm
except
ImportError
:
print
(
'Please run'
)
print
(
' pip install sentencepiece'
)
print
(
'before you continue'
)
print
(
"Please run"
)
print
(
" pip install sentencepiece"
)
print
(
"before you continue"
)
raise
try
:
from
pypinyin
import
pinyin
from
pypinyin.contrib.tone_convert
import
to_initials
,
to_finals_tone
except
ImportError
:
print
(
'Please run'
)
print
(
' pip install pypinyin'
)
print
(
'before you continue'
)
print
(
"Please run"
)
print
(
" pip install pypinyin"
)
print
(
"before you continue"
)
raise
assert
Path
(
tokens
)
.
is_file
(),
f
"File not exists, {tokens}"
...
...
@@ -119,7 +120,10 @@ def text2token(
if
txt
in
tokens_table
:
text_list
.
append
(
tokens_table
[
txt
]
if
output_ids
else
txt
)
else
:
print
(
f
"OOV token : {txt}, skipping text : {text}."
)
print
(
f
"Can't find token {txt} in token table, check your "
f
"tokens.txt see if {txt} in it. skipping text : {text}."
)
contain_oov
=
True
break
if
contain_oov
:
...
...
请
注册
或
登录
后发表评论