streamz | A swiss-army-knife of a Stream2 stream | Reactive Programming library
kandi X-RAY | streamz Summary
kandi X-RAY | streamz Summary
The native stream.Transform does not provide concurrent operations out of the box or multiple incoming pipes going into a single transform. Streamz is a lightweight wrapper around the Transform stream:.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of streamz
streamz Key Features
streamz Examples and Code Snippets
Community Discussions
Trending Discussions on streamz
QUESTION
I am trying to figure out a correct way of processing streaming data using streamz
. My streaming data is loaded using websocket-client
, after which I do this:
ANSWER
Answered 2021-Mar-22 at 06:03I don't think websocket-client
provides an async API and, so, it's blocking the event loop.
You should use an async websocket client, such as the one Tornado provides:
QUESTION
this is my first time asking something here, so I hope I am asking the following question the "correct way". If not, please let me know, and I will give more information.
I am using one Python script, to read and write 4000Hz of serial data to a CSV file.
The structure of the CSV file is as follows: (this example shows the beginning of the file)
...ANSWER
Answered 2020-Dec-21 at 12:49For the Googlers: I could not find a way to achieve my goal as described in the question.
However, if you are trying to plot live data, coming with high speed over serial comms (4000Hz in my case), I recommend designing your application as a single program with multiple processes.
The problem in my special case was, that when I tried to plot and compute the incoming data simultaneously in the same thread/task/process/whatever, my serial receive rate went down to 100Hz instead of 4kHz. The solution with multiprocessing and passing data using the quick_queue module between the processes I could resolve the problem.
I ended up, having a program, which receives data from a Teensy via serial communication at 4kHz, this incoming data was buffered to blocks of 4000 samples and then the data was pushed to the plotting process and additionally, the block was written to a CSV-file in a separate Thread.
Best, S
QUESTION
I have a pandas dataframe that includes timestamps, id, products, price and with more than 50+ columns.
I'd like to convert this data frame to a streaming data frame. For example, every 10 seconds, I'd like to receive 10 raws or 1 raw then after next 10 raws or 1 raw until the data frame ends.
I had a look to streamz library but couldn't find a proper function for this.
In this way, I am planning to apply some visualisation, and do some functional aggregations or further analysis.
...ANSWER
Answered 2020-Oct-06 at 14:40Previously I have gotten around a similar problem by using pd.date_range()
to create times with the desired interval, then slicing the original dataframe by the times in the range.
For example.
QUESTION
Imports:
...ANSWER
Answered 2020-Feb-07 at 11:29"Why is that?": because the Dask distributed scheduler (which executes the stream mapper and sink functions) and your python script run in different processes. When the "with" block context ends, your Dask Client is closed and execution shuts down before the items emitted to the stream are able reach the sink function.
"Is there a nice way to otherwise check if a Stream still contains elements being processed": not that I am aware of. However: if the behaviour you want is (I'm just guessing here) the parallel processing of a bunch of items, then Streamz is not what you should be using, vanilla Dask should suffice.
QUESTION
I am unable to convert the Streamz stream to Dask Stream which is generated using Kafka source.PFB code
...ANSWER
Answered 2019-Jul-16 at 14:07The kafka source, if not otherwise instructed, will start its own event loop in a thread. The call to Client()
also does this. To pass the loop from the one to the other, you can do
QUESTION
Based on the streamz documentation, one could leverage a dask distributed cluster in the following way:
...ANSWER
Answered 2018-Oct-03 at 06:35The basic rule in Dask is, if there is a distributed client defined, use it for any Dask computations. If there is more than one distributed client, use the most recently created on that is still alive.
Streamz does not explicitly let you choose which client to use when you .scatter()
, it uses dask.distributed.default_client()
to pick one. You may wish to raise an issue with them to allow a client=
keyword. The workflow doesn't even fit a context-based approach. For now, if you wanted to have simultaneous multiple streamz working with data on different Dask clusters, you would probably have to manipulate the state of dask.distributed.client._global_clients
.
QUESTION
I have specified url with some of below pattern.
- streamz/abc
- streamz/search/xyz
- streamz/abc/123
For that I created router like below.
...ANSWER
Answered 2017-Dec-04 at 04:40I ran into this error as well. It's caused by the bug in RxJS 5.5.3. I've changed the RxJS dependency to be as follows:
QUESTION
Works fine when source topic partition count = 1. If I bump up the partitions to any value > 1, I see the below error. Applicable to both Low level as well as the DSL API. Any pointers ? What could be missing ?
...ANSWER
Answered 2017-Feb-20 at 20:01It's an operational issue. Kafka Streams does not allow to change the number of input topic partitions during its "life time".
If you stop a running Kafka Streams application, change the number of input topic partitions, and restart your app it will break (with the error you see above). It is tricky to fix this for production use cases and it is highly recommended to not change the number of input topic partitions (cf. comment below). For POC/demos it's not difficult to fix though.
In order to fix this, you should reset your application using Kafka's application reset tool:
- http://docs.confluent.io/current/streams/developer-guide.html#application-reset-tool
- https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
Using the application reset tool, has the disadvantage that you wipe out your whole application state. Thus, in order to get your application into the same state as before, you need to reprocess the whole input topic from beginning. This is of course only possible, if all input data is still available and nothing got deleted by brokers that applying topic retention time/size policy.
Furthermore you should note, that adding partitions to input topics changes the topic's partitioning schema (be default hash-based partitioning by key). Because Kafka Streams assumes that input topics are correctly partitioned by key, if you use the reset tool and reprocess all data, you might get wrong result as "old" data is partitioned differently than "new" data (ie, data written after adding the new partitions). For production use cases, you would need to read all data from your original topic and write it into a new topic (with increased number of partitions) to get your data partitioned correctly (or course, this step might change the ordering of records with different keys -- what should not be an issue usually -- just wanted to mention it). Afterwards you can use the new topic as input topic for your Streams app.
This repartitioning step can also be done easily within you Streams application by using operator through("new_topic_with_more_partitions")
directly after reading the original topic and before doing any actual processing.
In general however, it is recommended to over partition your topics for production use cases, such that you will never need to change the number of partitions later on. The overhead of over partitioning is rather small and saves you a lot of hassle later on. This is a general recommendation if you work with Kafka -- it's not limited to Streams use cases.
One more remark:
Some people might suggest to increase the number of partitions of Kafka Streams internal topics manually. First, this would be a hack and is not recommended for certain reasons.
- It might be tricky to figure out what the right number is, as it depends on various factors (as it's a Stream's internal implementation detail).
- You also face the problem of breaking the partitioning scheme, as described in the paragraph above. Thus, you application most likely ends up in an inconsistent state.
In order to avoid inconsistent application state, Streams does not delete any internal topics or changes the number of partitions of internal topics automatically, but fails with the error message you reported. This ensure, that the user is aware of all implications by doing the "cleanup" manually.
Btw: For upcoming Kafka 0.10.2
this error message got improved: https://github.com/apache/kafka/blob/0.10.2/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L100-L103
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install streamz
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page