- 12 Sep, 2025 4 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR adds RK NPU support for SenseVoice non-streaming ASR models by implementing a new RKNN backend with greedy CTC decoding. - Adds offline RKNN implementation for SenseVoice models including model loading, feature processing, and CTC decoding - Introduces export tools to convert SenseVoice models from PyTorch to ONNX and then to RKNN format - Implements provider-aware validation to prevent mismatched model and provider usage
Fangjun Kuang authored -
Fangjun Kuang authored
-
- 10 Sep, 2025 4 commits
-
-
Fangjun Kuang authored
-
This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC. - Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path - Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.) - Provides comprehensive examples and tests across all supported platforms and languages
Fangjun Kuang authored -
Fangjun Kuang authored
-
Fangjun Kuang authored
-
- 09 Sep, 2025 4 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations. - Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.) - Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting - Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages
Fangjun Kuang authored -
This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate. - Added new OnlineToneCtcModel implementation with specialized processing for T-one models - Integrated T-one support into the existing CTC model pipeline and Python bindings - Added Python example and test scripts for the new functionality
Fangjun Kuang authored
-
- 08 Sep, 2025 1 commit
-
-
This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models. - Adds scripts to download, process, and test T-one models in sherpa-onnx format - Creates GitHub workflow for automated model export and publishing - Updates kaldi-native-fbank dependency to version 1.22.1
Fangjun Kuang authored
-
- 05 Sep, 2025 2 commits
-
-
This PR adds documentation for MentraOS, a smart glasses operating system that integrates sherpa-onnx for speech recognition functionality. The addition showcases another real-world application using the sherpa-onnx library. - Adds a new section documenting MentraOS integration with sherpa-onnx - Includes description of MentraOS features and platform support - References related pull request for implementation details
Fangjun Kuang authored -
This PR adds a helpful hint for Android developers who are trying to load model files from the SD card instead of the app's assets. The change detects when an absolute path is provided while an asset manager is still being used, which is a common configuration mistake. - Adds validation to detect absolute paths when using Android asset manager - Provides clear error messages guiding users to set assetManager to null for SD card file access - References the related issue for additional context (#2562)
Fangjun Kuang authored
-
- 04 Sep, 2025 3 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR disables loading native libraries from JAR resources specifically on Android platforms. The change prevents potential issues with JAR-based library loading on Android while maintaining compatibility with other platforms.
Fangjun Kuang authored
-
- 02 Sep, 2025 1 commit
-
-
凌封 authored
-
- 01 Sep, 2025 6 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR fixes the C API by adding proper support for durations in offline recognition results. The issue addresses problems introduced in a previous PR where the durations field was added to the C API struct but not properly handled across all language bindings. Key changes: - Adds durations field handling across multiple language bindings (Swift, Kotlin, Java, C#) - Fixes field ordering in C API struct to ensure ABI compatibility - Updates JNI implementation to properly extract and pass durations data
Fangjun Kuang authored
-
- 27 Aug, 2025 2 commits
-
-
This PR fixes the uploading process for win32 libraries to Hugging Face by updating Windows OS detection and correcting the file copy destination path. - Replaces deprecated wmic command with PowerShell-based OS detection for better reliability - Adds fallback mechanism using cmd /c ver when PowerShell is unavailable - Corrects the destination path for win32 library archives to include version subdirectory
Fangjun Kuang authored
- 26 Aug, 2025 4 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR simplifies the usage of the non-Android Java API by providing platform-specific JAR files that include native shared libraries, eliminating the need for users to manually manage native dependencies. - Refactored LibraryUtils.java to support multiple library loading methods including extracting from JAR resources - Added build infrastructure to create platform-specific native library JAR files - Introduced debug capabilities and improved error handling for library loading
Fangjun Kuang authored -
Fangjun Kuang authored
-
- 25 Aug, 2025 8 commits
-
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
This PR adds support for two new Piper TTS (Text-to-Speech) models: an Indonesian model (id_ID-news_tts-medium) and a Hindi model (hi_IN-rohan-medium).
Fangjun Kuang authored -
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
Fangjun Kuang authored
-
- 21 Aug, 2025 1 commit
-
-
Brad Murray authored
-