vosk | VOSK Speech Recognition Toolkit | Speech library

by alphacep C Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | vosk Summary

vosk is a C library typically used in Artificial Intelligence, Speech applications. vosk has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

VOSK Speech Recognition Toolkit

Support

Quality

Security

License

Reuse

Support

vosk has a low active ecosystem.

It has 197 star(s) with 25 fork(s). There are 24 watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 4 have been closed. On average issues are closed in 0 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of vosk is current.

Quality

vosk has 0 bugs and 0 code smells.

Security

vosk has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

vosk code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

vosk is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

vosk releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

It has 262 lines of code, 16 functions and 7 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of vosk

Get all kandi verified functions for this library.

vosk Key Features

No Key Features are available at this moment for vosk.

vosk Examples and Code Snippets

No Code Snippets are available at this moment for vosk.

Community Discussions

Trending Discussions on vosk

Cannot change python version of flask

remove duplicates by key from nested list of objects of custom dictionaries

Where should i put Model files of VOSK speech recognition in java? ERROR (VoskAPI:Model():model.cc:122)

How to turn a PCM byte array into little-endian and mono?

Cannot resolve method 'setText(java.lang.String[] with ResultView

With asyncio in Python 3 code, how can I (re)start/stop non-blocking websocket IO recurrently?

voice recognition constantly in background android

Vosk (Kaldi) offline speech recognition in Unity

How to convert audio to text using Librosa or Kaldi?

QUESTION

Cannot change python version of flask

Asked 2022-Mar-21 at 15:25

I am using vosk to compare user voice and given text to read, and print out an accuracy json. I am able to run vosk separately via the terminal and get results. But when i try to run it through flask i get the following error.

...

ANSWER

Answered 2022-Mar-21 at 14:18

I, suggest you to consider using a virtual environment, so that package installation can be constrained to a particular Python version, instead of choosing the system default, as,

Source https://stackoverflow.com/questions/71558370

QUESTION

remove duplicates by key from nested list of objects of custom dictionaries

Asked 2022-Mar-02 at 16:09

I have a nested list of objects called "words". It consists of objects of a class that has data like conf(float), end(float), start(float), word(string) I want to remove duplicate occuring objects which has same "word"

...

ANSWER

Answered 2022-Mar-02 at 12:45

( First answer:
Looking at this code, I guess nr_words is a list.
Could you specify what nr_words represents ? Is it like the list of the 'already seen' words ?

I also see that you print out nr.word so I suppose that nr_words is a list of Word objects.

But, the 2nd for loop is iterating over all the values of the nr_words list (Word objects), not its indexes.
So when you compare the two Word object on line 4, I think you should simply be using nr as the other argument for your compare() method, instead of nr_words[nr].
)

EDIT:
Reply to your comment

nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'

The error is because when the two words are not the same, you append w.word, w.start and w.end to the nr_words list (which are strings, floats and floats respectively) Try by appending only the Word object like so :

Corrected Code:

Source https://stackoverflow.com/questions/71322115

QUESTION

Where should i put Model files of VOSK speech recognition in java? ERROR (VoskAPI:Model():model.cc:122)

Asked 2022-Jan-29 at 18:43

I have tried to use VOSK but get this error:

...

ANSWER

Answered 2022-Jan-29 at 18:43

Put the Model folder beside the src folder.

Source https://stackoverflow.com/questions/69695446

QUESTION

How to turn a PCM byte array into little-endian and mono?

Asked 2021-Dec-23 at 18:59

I'm trying to feed audio from an online communication app into the Vosk speech recognition API.

The audio comes in form of a byte array and with this audio format PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian. In order to be able to process it with Vosk, it needs to be mono and little-endian.

This is my current attempt:

...

ANSWER

Answered 2021-Sep-29 at 16:37

Signed PCM is certainly supported. The problem is that 48000 fps is not. I think the highest frame rate supported by Java directly is 44100.

As to what course of action to take, I'm not sure what to recommend. Maybe there are libraries that can be employed? It is certainly possible to do the conversions manually with the byte data directly, where you enforce the expected data formats.

I can write a bit more about the conversion process itself (assembling bytes into PCM, manipulating the PCM, creating bytes from PCM), if requested. Is the VOSK expecting 48000 fps also?

Going from stereo to mono is a matter of literally taking the sum of the left and right PCM values. It is common to add a step to ensure the range is not exceeded. (16-bit range if PCM is coded as normalized floats = -1 to 1, range if PCM is coded as shorts = -32768 to 32767.)

Following code fragment is an example of taking a single PCM value (signed float, normalized to range between -1 and 1) and generating two bytes (16-bits) in little endian order. The array buffer is of type float and holds the PCM values. The array audioBytes is of type byte.

Source https://stackoverflow.com/questions/69360921

QUESTION

Cannot resolve method 'setText(java.lang.String[] with ResultView

Asked 2021-Jun-14 at 14:08

i'm writing a code using vosk ( for offline speech recognition), in my string.xml i wrote a string-array:

...

ANSWER

Answered 2021-Jun-14 at 12:54

Let us go through your code, specifically this block

Source https://stackoverflow.com/questions/67970489

QUESTION

With asyncio in Python 3 code, how can I (re)start/stop non-blocking websocket IO recurrently?

Asked 2021-Mar-05 at 09:06

In my live phone speech recognition project Python's asyncio and websockets modules are used basically to enable data exchange between client and server in asynchronous mode. The audio stream which to be recognized comes to the client from inside of a PBX channel (Asterisk PBX works for that) via a local wav file that cumulates all data from answering call until hangup event. While conversation is going on, an async producer pushes chunks of call record (each of them no larger than 16 kB) to asyncio queue, so that a consumer coroutine can write data to buffer before sending to the recognition engine server (my pick is Vosk instance with Kaldi engine designed to connect using websocket interface). Once the buffer exceeds a specific capacity (for example it may be 288 kB), the data should be flushed to recognition by send function and returned (as a transcript of the speech) by recv. The real-time recognition does matter here, therefore I need to guarantee that socket operations like recv will not halt both coroutines throughout websocket session (they should be able to keep queue-based data flow until the hangup event). Let's take a look at whole program, first of all there is a main where an event loop gets instantiated as well as a couple of tasks:

...

ANSWER

Answered 2021-Mar-05 at 09:06

If I understand the issue correctly, you probably want to replace await self.do_recognition() with asyncio.create_task(self.do_recognition()) to make do_recognition execute in the background. If you need to support Python 3.6 and earlier, you can use loop.create_task(...) or asyncio.ensure_future(...), all of which in this case do the same thing.

When doing that you'll also need to extract the value of self._buffer and pass it to do_recognition as parameter, so that it can send the buffer contents independently of the new data that arrives.

Two notes unrelated to the question:

The code is accessing internal implementation attributes of queue, which should be avoided in production code because it can stop working at any point, even in a bugfix release of Python. Attributes that begin with _ like _finished and _unfinished_tasks are not covered by backward compatibility guarantees and can be removed, renamed, or change meaning without notice.
You can import CancelledError from the top-level asyncio package which exposes it publicly. You don't need to refer to the internal concurrent.futures._base module, which just happens to be where the class is defined by the implementation.

Source https://stackoverflow.com/questions/66469586

QUESTION

voice recognition constantly in background android

Asked 2020-Sep-20 at 11:49

I want develop application that whenever it recognizes a keyword it does something. it needs to be in listening mode all the time, in backgeound too. I was exposed to this and this. I treid run it but it is not work when I am speaking. actually I read it still doesn't support my native language. is that the reason? I want to know how it works? does it is doing speach to text and saved it in assets files? does it is run in background? does it is used AI models? how it behaves when two apps need mic resource in parallel? noises? does it is work with Neural Networks API? how can I start developing such a thing?

thanks!

...

ANSWER

Answered 2020-Sep-13 at 10:04

It is great you tried Vosk offline speech recognition on Android, here are some answers to your questions:

actually I read it still doesn't support my native language.

If you are about Hebrew, we might support it in the future, and you can build it yourself.

is that the reason?

You didn't provide enough information to answer this, please explain a bit more what is "it is not work"

I want to know how it works?

Extensive documentation on speech recognition is available on lectures, courses and books. You can find some introduction here for example: https://www.youtube.com/watch?v=q67z7PTGRi8

does it is doing speech to text and saved it in assets files?

It does speech to text, but it doesn't save results into assets, it just displays them. You can not modify assets, they are static.

does it is run in background?

Yes

does it is used AI models?

Sure

how it behaves when two apps need mic resource in parallel?

In android it is not possible to record audio from two apps in parallel, second one will be blocked.

noises?

It is robust to noises.

does it is work with Neural Networks API?

No, it is portable

how can I start developing such a thing?

Get some basic understanding and start writing the code. If you have further questions you can ask them in the Telegram chat

Source https://stackoverflow.com/questions/63868540

QUESTION

Vosk (Kaldi) offline speech recognition in Unity

Asked 2020-Aug-16 at 15:31

How to implement and use Vosk library into Unity project? Please write steps 1,2,3... Vosk library here - https://github.com/alphacep/vosk-api

...

ANSWER

Answered 2020-Aug-16 at 15:31

I actually did it on Mac OS X.

Follow instructions to compile Kalid at https://alphacephei.com/vosk/install
Follow instruction to make C# wrapper (same)
Create Xcode project and make bundle.
Then add bundle and c# files to your unity project.

Source https://stackoverflow.com/questions/63407488

QUESTION

How to convert audio to text using Librosa or Kaldi?

Asked 2020-Feb-26 at 15:20

Is there any code for converting audio (wav file) to text using Kaldi or librosa. I have used vosk library in Ubuntu by extracting build-in model from https://github.com/alphacep/kaldi-android-demo/releases but I want the proper way in Windows using Kaldi/Librosa and my own model for the same.

...

ANSWER

Answered 2020-Feb-26 at 15:20

On Windows you can install vosk from this repo:

Source https://stackoverflow.com/questions/60414888

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install vosk

You can download it from GitHub.

Support

VOSK presentation at NSU (in Russian)Memory, Modularity, and the Theory of Deep Learnability. Google Tech Talk by Rina Panigrahy shows importance of memory for learning complex functions.Large Language Models in Machine Translation by Thorsten Brants at al. Google's paper on simple backoff terascale LM.Deep Learning of Binary Hash Codes for Fast Image Retrieval by Kevin Lin at al. a nice deephash implementationEpisodic Memory in Lifelong Language LearningExtreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M ProductsOn-device Supermarket Product Recognition Google's good example of kNN for mobile searchHash-Routed Neural Networks Great idea and solid mathTowards Lifelong Learning of End-to-end ASR Methods get more publicity

Find more information at: