kinesis-sql | Kinesis Connector for Structured Streaming
kandi X-RAY | kinesis-sql Summary
kandi X-RAY | kinesis-sql Summary
Kinesis Connector for Structured Streaming
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of kinesis-sql
kinesis-sql Key Features
kinesis-sql Examples and Code Snippets
Community Discussions
Trending Discussions on kinesis-sql
QUESTION
I am creating a Dataframe from a kafka topic using spark streaming. I want to write the Dataframe into a Kinesis Producer. I understand that there is no official API for this as of now. But there are multiple APIs available over the internet , but sadly, none of them worked for me. Spark version : 2.2 Scala : 2.11
I tried using https://github.com/awslabs/kinesis-kafka-connector and build the jar. But getting errors due to conflicting package names between this jar and spark API. Please help.
########### Here is the code for others: ...ANSWER
Answered 2019-Jul-09 at 16:05Kafka Connect is a service to which you can POST your connector specifications (kinesis in this case), which then takes care of running the connector. It supports quite a few transformations as well while processing the records. Kafka Connect plugins are not intended to be used with Spark applications.
If your use case requires you to do some business logic while processing the records, then you could go with either Spark Streaming or Structured Streaming approach.
If you want to take Spark based approach, below are the 2 options I can think of.
Use Structured Streaming. You could use a Strucuted streaming connector for Kinesis. You can find one here. There may be others too. This is the only stable and open source connector I am aware of. You can find an example for using Kinesis as a sink here.
Use Kinesis Producer Library or aws-java-sdk-kinesis library to publish records from your Spark Streaming application. Using KPL is a preferred approach here. You could do
mapPartitions
and create a Kinesis client per partition and publish the records using these libraries. There are plenty of examples in AWS docs for these 2 libraries.
QUESTION
We have a spark Streaming application. Architecture is as follows
Kinesis to Spark to Kafka.
The Spark application is using qubole/kinesis-sql for structured streaming from Kinesis. The data is then aggregated and then pushed to Kafka.
Our use case demands a delay of 4 minutes before pushing to Kafka.
The windowing is done with 2 minutes and watermark of 4 minutes
...ANSWER
Answered 2019-Jun-06 at 08:28Change your output mode from update
to append
(the default option). The output
mode will write all updated rows to the sink, hence, if you use a watermark or not will not matter.
However, with the append
mode any writes will need wait until the watermark is crossed - which is exactly what you want:
Append mode uses watermark to drop old aggregation state. But the output of a windowed aggregation is delayed the late threshold specified in
withWatermark()
as by the modes semantics, rows can be added to the Result Table only once after they are finalized (i.e. after watermark is crossed).
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install kinesis-sql
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page