vosk | VOSK Speech Recognition Toolkit | Speech library
kandi X-RAY | vosk Summary
kandi X-RAY | vosk Summary
VOSK Speech Recognition Toolkit
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of vosk
vosk Key Features
vosk Examples and Code Snippets
Community Discussions
Trending Discussions on vosk
QUESTION
I am using vosk to compare user voice and given text to read, and print out an accuracy json. I am able to run vosk separately via the terminal and get results. But when i try to run it through flask i get the following error.
...ANSWER
Answered 2022-Mar-21 at 14:18I, suggest you to consider using a virtual environment, so that package installation can be constrained to a particular Python version, instead of choosing the system default, as,
QUESTION
I have a nested list of objects called "words". It consists of objects of a class that has data like conf(float), end(float), start(float), word(string) I want to remove duplicate occuring objects which has same "word"
...ANSWER
Answered 2022-Mar-02 at 12:45( First answer:
Looking at this code, I guess nr_words
is a list.
Could you specify what nr_words
represents ? Is it like the list of the 'already seen' words ?
I also see that you print out nr.word
so I suppose that nr_words
is a list of Word
objects.
But, the 2nd for
loop is iterating over all the values of the nr_words
list (Word
objects), not its indexes.
So when you compare the two Word object on line 4, I think you should simply be using nr
as the other
argument for your compare()
method, instead of nr_words[nr]
.
)
EDIT:
Reply to your comment
nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'
The error is because when the two words are not the same, you append w.word
, w.start
and w.end
to the nr_words
list (which are strings, floats and floats respectively)
Try by appending only the Word
object like so :
Corrected Code:
QUESTION
I have tried to use VOSK but get this error:
...ANSWER
Answered 2022-Jan-29 at 18:43Put the Model folder beside the src folder.
QUESTION
I'm trying to feed audio from an online communication app into the Vosk speech recognition API.
The audio comes in form of a byte array and with this audio format PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
.
In order to be able to process it with Vosk, it needs to be mono
and little-endian
.
This is my current attempt:
...ANSWER
Answered 2021-Sep-29 at 16:37Signed PCM is certainly supported. The problem is that 48000 fps is not. I think the highest frame rate supported by Java directly is 44100.
As to what course of action to take, I'm not sure what to recommend. Maybe there are libraries that can be employed? It is certainly possible to do the conversions manually with the byte data directly, where you enforce the expected data formats.
I can write a bit more about the conversion process itself (assembling bytes into PCM, manipulating the PCM, creating bytes from PCM), if requested. Is the VOSK expecting 48000 fps also?
Going from stereo to mono is a matter of literally taking the sum of the left and right PCM values. It is common to add a step to ensure the range is not exceeded. (16-bit range if PCM is coded as normalized floats = -1 to 1, range if PCM is coded as shorts = -32768 to 32767.)
Following code fragment is an example of taking a single PCM value (signed float, normalized to range between -1 and 1) and generating two bytes (16-bits) in little endian order. The array buffer is of type float
and holds the PCM values. The array audioBytes is of type byte
.
QUESTION
i'm writing a code using vosk ( for offline speech recognition), in my string.xml i wrote a string-array:
...ANSWER
Answered 2021-Jun-14 at 12:54Let us go through your code, specifically this block
QUESTION
In my live phone speech recognition project Python's asyncio
and websockets
modules are used basically to enable data exchange between client and server in asynchronous mode. The audio stream which to be recognized comes to the client from inside of a PBX channel (Asterisk PBX works for that) via a local wav
file that cumulates all data from answering call until hangup event. While conversation is going on, an async producer pushes chunks of call record (each of them no larger than 16 kB) to asyncio queue, so that a consumer coroutine can write data to buffer before sending to the recognition engine server (my pick is Vosk
instance with Kaldi
engine designed to connect using websocket interface). Once the buffer exceeds a specific capacity (for example it may be 288 kB), the data should be flushed to recognition by send
function and returned (as a transcript of the speech) by recv
. The real-time recognition does matter here, therefore I need to guarantee that socket operations like recv
will not halt both coroutines throughout websocket session (they should be able to keep queue-based data flow until the hangup event). Let's take a look at whole program, first of all there is a main
where an event loop gets instantiated as well as a couple of tasks:
ANSWER
Answered 2021-Mar-05 at 09:06If I understand the issue correctly, you probably want to replace await self.do_recognition()
with asyncio.create_task(self.do_recognition())
to make do_recognition
execute in the background. If you need to support Python 3.6 and earlier, you can use loop.create_task(...)
or asyncio.ensure_future(...)
, all of which in this case do the same thing.
When doing that you'll also need to extract the value of self._buffer
and pass it to do_recognition
as parameter, so that it can send the buffer contents independently of the new data that arrives.
Two notes unrelated to the question:
The code is accessing internal implementation attributes of queue, which should be avoided in production code because it can stop working at any point, even in a bugfix release of Python. Attributes that begin with
_
like_finished
and_unfinished_tasks
are not covered by backward compatibility guarantees and can be removed, renamed, or change meaning without notice.You can import
CancelledError
from the top-levelasyncio
package which exposes it publicly. You don't need to refer to the internalconcurrent.futures._base
module, which just happens to be where the class is defined by the implementation.
QUESTION
I want develop application that whenever it recognizes a keyword it does something. it needs to be in listening mode all the time, in backgeound too. I was exposed to this and this. I treid run it but it is not work when I am speaking. actually I read it still doesn't support my native language. is that the reason? I want to know how it works? does it is doing speach to text and saved it in assets files? does it is run in background? does it is used AI models? how it behaves when two apps need mic resource in parallel? noises? does it is work with Neural Networks API? how can I start developing such a thing?
thanks!
...ANSWER
Answered 2020-Sep-13 at 10:04It is great you tried Vosk offline speech recognition on Android, here are some answers to your questions:
actually I read it still doesn't support my native language.
If you are about Hebrew, we might support it in the future, and you can build it yourself.
is that the reason?
You didn't provide enough information to answer this, please explain a bit more what is "it is not work"
I want to know how it works?
Extensive documentation on speech recognition is available on lectures, courses and books. You can find some introduction here for example: https://www.youtube.com/watch?v=q67z7PTGRi8
does it is doing speech to text and saved it in assets files?
It does speech to text, but it doesn't save results into assets, it just displays them. You can not modify assets, they are static.
does it is run in background?
Yes
does it is used AI models?
Sure
how it behaves when two apps need mic resource in parallel?
In android it is not possible to record audio from two apps in parallel, second one will be blocked.
noises?
It is robust to noises.
does it is work with Neural Networks API?
No, it is portable
how can I start developing such a thing?
Get some basic understanding and start writing the code. If you have further questions you can ask them in the Telegram chat
QUESTION
How to implement and use Vosk library into Unity project? Please write steps 1,2,3... Vosk library here - https://github.com/alphacep/vosk-api
...ANSWER
Answered 2020-Aug-16 at 15:31I actually did it on Mac OS X.
- Follow instructions to compile Kalid at https://alphacephei.com/vosk/install
- Follow instruction to make C# wrapper (same)
- Create Xcode project and make bundle.
- Then add bundle and c# files to your unity project.
QUESTION
Is there any code for converting audio (wav file) to text using Kaldi or librosa. I have used vosk library in Ubuntu by extracting build-in model from https://github.com/alphacep/kaldi-android-demo/releases but I want the proper way in Windows using Kaldi/Librosa and my own model for the same.
...ANSWER
Answered 2020-Feb-26 at 15:20On Windows you can install vosk from this repo:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install vosk
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page