Toggle navigation
Toggle navigation
此项目
正在载入...
Sign in
xuning
/
sherpaonnx
转到一个项目
Toggle navigation
项目
群组
代码片段
帮助
Toggle navigation pinning
Project
Activity
Repository
Pipelines
Graphs
Issues
0
Merge Requests
0
Wiki
Network
Create a new issue
Builds
Commits
Authored by
Sheldon Robinson
2025-02-20 17:29:28 -0500
Browse Files
Options
Browse Files
Download
Email Patches
Plain Diff
Committed by
GitHub
2025-02-21 06:29:28 +0800
Commit
9c810ce3fe7c000e6591a59522b9c5bff8e957ae
9c810ce3
1 parent
48010941
Fix #1901: UnicodeEncodeError running export_bpe_vocab.py (#1902)
显示空白字符变更
内嵌
并排对比
正在显示
1 个修改的文件
包含
5 行增加
和
4 行删除
scripts/export_bpe_vocab.py
scripts/export_bpe_vocab.py
查看文件 @
9c810ce
...
...
@@ -26,6 +26,7 @@
# Please install a version >=0.1.96
import
argparse
import
codecs
from
typing
import
Dict
try
:
...
...
@@ -56,11 +57,11 @@ def main():
sp
=
spm
.
SentencePieceProcessor
()
sp
.
Load
(
model_file
)
vocabs
=
[
sp
.
IdToPiece
(
id
)
for
id
in
range
(
sp
.
GetPieceSize
())]
with
open
(
vocab_file
,
"w"
)
as
vfile
:
vocabs
=
[
sp
.
id_to_piece
(
id
)
for
id
in
range
(
sp
.
get_piece_size
())]
with
codecs
.
open
(
vocab_file
,
"w"
,
"utf-8"
)
as
vfile
:
for
v
in
vocabs
:
id
=
sp
.
PieceToId
(
v
)
vfile
.
write
(
f
"{v}
\t
{sp.GetScore(id)}
\n
"
)
id
=
sp
.
piece_to_id
(
v
)
vfile
.
write
(
f
"{v}
\t
{sp.get_score(id)}
\n
"
)
print
(
f
"Vocabulary file is written to {vocab_file}"
)
...
...
请
注册
或
登录
后发表评论