• 此项目
    • 正在载入...
  • Sign in

xuning / sherpaonnx

96x96
转到一个项目
Toggle navigation
  • 项目
  • 群组
  • 代码片段
  • 帮助
Toggle navigation pinning
  • Project
  • Activity
  • Repository
  • Pipelines
  • Graphs
  • Issues 0
  • Merge Requests 0
  • Wiki
  • Network
  • Create a new issue
  • Builds
  • Commits
  • Files
  • Commits
  • Network
  • Compare
  • Branches
  • Tags
Switch branch/tag
  • sherpaonnx
  • scripts
  • t-one
  • generate_tokens.py
  • Export models from https://github.com/voicekit-team/T-one to sherpa-onnx (#2571) · e4f48ce6 ...
    e4f48ce6 浏览文件
    This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models.
    
    - Adds scripts to download, process, and test T-one models in sherpa-onnx format
    - Creates GitHub workflow for automated model export and publishing
    - Updates kaldi-native-fbank dependency to version 1.22.1
    Fangjun Kuang authored 2025-09-08 17:22:23 +0800
generate_tokens.py 459 字节
原文件 审查 历史 永久链接
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
#!/usr/bin/env python3
# Copyright    2025  Xiaomi Corp.        (authors: Fangjun Kuang)

import json


def main():
    with open("vocab.json") as f:
        token2id = json.load(f)

    with open("tokens.txt", "w", encoding="utf-8") as f:
        for s, i in token2id.items():
            if s == "|":
                s = " "
            if s == "[PAD]":
                s = "<blk>"

            f.write(f"{s} {i}\n")


if __name__ == "__main__":
    main()