Commits · 75630b986b51b2cf481c517fc49a61cd9fc9f787 · xuning / sherpaonnx

14 May, 2024 4 commits
- Support adding puncutations to text for node-addon-api (#876) · 75630b98
  75630b98 Browse File
  
  Fangjun Kuang authored 2024-05-14 19:28:56 +0800
- Add audio tagging APIs for node-addon-api (#875) · d19f50b7
  d19f50b7 Browse File
  
  Fangjun Kuang authored 2024-05-14 17:32:30 +0800
- Add speaker identification APIs for node-addon-api (#874) · 388e6a98
  388e6a98 浏览文件
  
  Fangjun Kuang authored 2024-05-14 13:28:50 +0800
- Refactor node-addon-api to remove duplicate. (#873) · 0895b648
  0895b648 Browse Directory
  
  Fangjun Kuang authored 2024-05-14 10:08:11 +0800
13 May, 2024 5 commits
- Add spoken language identification for node-addon-api (#872) · 939fdd94
  939fdd94 Browse File
  
  Fangjun Kuang authored 2024-05-13 20:26:11 +0800
- Add TTS for node-addon-api (#871) · 031134b4
  031134b4 Browse File
  
  Fangjun Kuang authored 2024-05-13 19:24:09 +0800
- fixing bug and compiler error (#870) · 740d7ae9 ...
  740d7ae9 浏览文件
```
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
```
  Manix authored 2024-05-13 17:44:03 +0800
- Add non-streaming ASR APIs for node-addon-api (#868) · 697b9607
  697b9607 Browse File
  
  Fangjun Kuang authored 2024-05-13 16:03:34 +0800
- Add streaming CTC ASR APIs for node-addon-api (#867) · 384f96c4
  384f96c4 Browse File
  
  Fangjun Kuang authored 2024-05-13 11:58:25 +0800
12 May, 2024 2 commits
- Add Android APKs for NeMo CTC models. (#866) · db85b2c1
  db85b2c1 Browse Directory
  
  Fangjun Kuang authored 2024-05-12 14:58:36 +0800
- Fix node addon tests (#865) · 7322f4e0 ...
  7322f4e0 浏览文件
```
* Install naudiodon2 manually.

It is needed only when using a microphone. The CI tests don't need it.
```
  Fangjun Kuang authored 2024-05-12 12:03:43 +0800
11 May, 2024 3 commits
- Add node-addon-api for VAD (#864) · eee5d8a1
  eee5d8a1 浏览文件
  
  Fangjun Kuang authored 2024-05-11 20:58:23 +0800
- Add Speaker ID demo for C# (#862) · 677bc1da
  677bc1da Browse Directory
  
  Fangjun Kuang authored 2024-05-11 13:27:33 +0800
- Fix Python TTS examples for models using jieba. (#861) · a88b3bac
  a88b3bac Browse Directory
  
  Fangjun Kuang authored 2024-05-11 09:21:51 +0800
10 May, 2024 4 commits
- Add more streaming ASR methods for node-addon-api (#860) · 65f51614
  65f51614 Browse File
  
  Fangjun Kuang authored 2024-05-10 18:21:05 +0800
- Add C++ support for streaming NeMo CTC models. (#857) · 46e4e5b7
  46e4e5b7 浏览文件
  
  Fangjun Kuang authored 2024-05-10 16:26:43 +0800
- Solve the issue of missing the last sentence with punctuation (#856) · 1eb60e87 ...
  1eb60e87 Browse Directory
```
Co-authored-by: Hao You <13182720519@sina.cn>
```
  yh646492956 authored 2024-05-10 15:41:42 +0800
- Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) · 17cd3a5f
  17cd3a5f Browse File
  
  Fangjun Kuang authored 2024-05-10 12:15:39 +0800
09 May, 2024 2 commits
- Add C++ support for non-streaming NeMo fast conformer hybrid transducer ctc (the ctc branch) (#848) · 5d8c35e4
  5d8c35e4 Browse File
  
  Fangjun Kuang authored 2024-05-09 15:32:22 +0800
- Export non-streaming NeMo faster conformer hybrid transducer and ctc to sherpa-onnx (#847) · 5ed3ec1c
  5ed3ec1c 浏览文件
  
  Fangjun Kuang authored 2024-05-09 13:59:47 +0800
08 May, 2024 2 commits
- Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX (#844) · 68b25abf
  68b25abf Browse Directory
  
  Fangjun Kuang authored 2024-05-08 19:07:49 +0800
- Export NeMo FastConformer Hybrid Transducer-CTC Large Streaming to ONNX. (#843) · a9f936e9
  a9f936e9 Browse Directory
  
  Fangjun Kuang authored 2024-05-08 12:33:46 +0800
07 May, 2024 2 commits
- Publish node-addon-api npm package for linux arm64 (#841) · dbaa26ff
  dbaa26ff Browse File
  
  Fangjun Kuang authored 2024-05-07 23:05:40 +0800
- Add links to pre-built APKs and pre-trained models to README. (#840) · d2e86b04
  d2e86b04 Browse File
  
  Fangjun Kuang authored 2024-05-07 12:28:42 +0800
06 May, 2024 3 commits
- Publish npm package with node-addon-api for Windows (#838) · 37a4135d
  37a4135d 浏览文件
  
  Fangjun Kuang authored 2024-05-06 16:21:29 +0800
- Upload two more 3d-speaker models (#837) · e1bb9288
  e1bb9288 Browse Directory
  
  Fangjun Kuang authored 2024-05-06 12:23:49 +0800
- Update 3dspeaker/export-onnx.py (#836) · 9c8255fd ...
  9c8255fd 浏览文件
```
Update to match the changes in infer_sv.py at 3D-speaker. 

Added 2 more supported models and "zh_en" language.
```
  chiiyeh authored 2024-05-06 12:10:35 +0800
04 May, 2024 1 commit
- Publish node-addon-api wrapper for sherpa-onnx as npm packages (#829) · 4f758e6c
  4f758e6c 浏览文件
  
  Fangjun Kuang authored 2024-05-04 13:27:39 +0800
03 May, 2024 1 commit
- Begin to add node-addon-api for sherpa-onnx (#826) · 2f9553d8
  2f9553d8 Browse Directory
  
  Fangjun Kuang authored 2024-05-03 14:47:40 +0800
01 May, 2024 1 commit
- Fix typos in JNI TTS (#824) · fcd60242
  fcd60242 Browse Directory
  
  Fangjun Kuang authored 2024-05-01 14:14:24 +0800
29 Apr, 2024 1 commit
- Add Java API for speaker identification (#822) · cff20762
  cff20762 浏览文件
  
  Fangjun Kuang authored 2024-04-29 21:23:56 +0800
28 Apr, 2024 1 commit
- Add Java API for audio tagging (#820) · 88202f05
  88202f05 浏览文件
  
  Fangjun Kuang authored 2024-04-28 22:26:04 +0800
26 Apr, 2024 8 commits
- Add Java and Kotlin API for punctuation models (#818) · 5407f880
  5407f880 Browse File
  
  Fangjun Kuang authored 2024-04-26 22:06:48 +0800
- Add Java API for spoken language identification with whisper multilingual models (#817) · db259862
  db259862 Browse Directory
  
  Fangjun Kuang authored 2024-04-26 19:05:39 +0800
- Fix a bug for offline paraformer (#816) · f2d074ae
  f2d074ae Browse File
  
  Fangjun Kuang authored 2024-04-26 16:40:42 +0800
- Fix C# to support Chinese tts models using jieba (#815) · 612002da
  612002da Browse Directory
  
  Fangjun Kuang authored 2024-04-26 11:50:07 +0800
- Fix building wheels for macOS (#814) · c693676d
  c693676d Browse File
  
  Fangjun Kuang authored 2024-04-26 10:05:39 +0800
- Adding temperature scaling on Joiner logits: (#789) · 2e45d327 ...
  2e45d327 浏览文件
```
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
```
  Karel Vesely authored 2024-04-26 09:44:26 +0800
- Add Java API for text-to-speech (#811) · 15772d21
  15772d21 浏览文件
  
  Fangjun Kuang authored 2024-04-26 09:26:39 +0800
- Add function 'tolowerUnicode' in sherpa-onnx-microphone (fix #791) (#812) · fa242992
  fa242992 Browse Directory
  
  Daniel Doña authored 2024-04-26 09:19:32 +0800