kinesis | A Node.js stream implementation of Amazon 's Kinesis
kandi X-RAY | kinesis Summary
kandi X-RAY | kinesis Summary
A Node.js stream implementation of Amazon's Kinesis. Allows the consumer to pump data directly into (and out of) a Kinesis stream. This makes it trivial to setup Kinesis as a logging sink with Bunyan, or any other logging library. For setting up a local Kinesis instance (eg for testing), check out Kinesalite.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Execute a request .
- Default retry policy .
- Make an HTTP request
- resolve options
- Retrieves the initial retries of the given request
- A KinesStream instance .
- Loads credentials
- Lists a list of streams
- Compares two arrays
- Called when put request .
kinesis Key Features
kinesis Examples and Code Snippets
var AWS = require('aws-sdk');
var firehoser = require('firehoser')
AWS.config.update({
accessKeyId: 'hardcoded-credentials',
secretAccessKey: 'are-not-a-good-idea'
});
let firehose = new firehoser.DeliveryStream('my_delivery_stream_name');
//
async function ensureStream() {
const kinesis = new AWS.Kinesis({
endpoint: `${process.env.LAMBDA_KINESIS_HOST}:${process.env.LAMBDA_KINESIS_PORT}`,
region: process.env.LAMBDA_REGION,
apiVersion: '2013-12-02',
sslEnabled: false
})
var AWS = new AWS.Kinesis({
region: 'us-east-1',
params: { StreamName: 'my-stream' }
});
// see below for options
var readable = require('kinesis-readable')(client, options);
readable
// 'data' events will trigger for a set of records in the
aws kinesis put-record --stream-name GregorSamsa --partition-key gregor-samsa --data "whatever" --region us-west-2 --profile aws-credentials
{
"ShardId": "shardId-000000000000",
"SequenceNumber": "495720689064411518388566079791487
Community Discussions
Trending Discussions on kinesis
QUESTION
I am trying to create a table (150 rows, 165 columns) in which :
- Each row is the name of a Pokemon (original Pokemon, 150)
- Each column is the name of an "attack" that any of these Pokemon can learn (first generation)
- Each element is either "1" or "0", indicating if that Pokemon can learn that "attack" (e.g. 1 = yes, 0 = no)
I was able to manually create this table in R:
Here are all the names:
...ANSWER
Answered 2022-Apr-04 at 22:59Here is the a solution taking the list of url to webpages of interest, collecting the moves from each table and creating a dataframe with the "1s".
Then combining the individual tables into the final answer
QUESTION
Im making some event sourcing using dms, and i need to include dbname on the metadata that is sent from some postgresql databases to kinesis (as target) i got here https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Tablesettings.html
i runned out of ideas. Adding a tag with dbname to the message probably works too but i cant find some precise documentation of how to do that
...ANSWER
Answered 2022-Mar-17 at 12:59One possible route is to add a prefix to the schema or rename the schema. Just add the database name with a delimiter so you can easily parse it out.
Take a look at the first example on https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html
QUESTION
I'm using one of the Docker images of EMR on EKS (emr-6.5.0:20211119) and investigating how to work on Kafka with Spark Structured Programming (pyspark). As per the integration guide, I run a Python script as following.
...ANSWER
Answered 2022-Mar-07 at 21:10You would use --jars
to refer to local filesystem in-place of --packages
QUESTION
We have an Apache Flink application which processes events
- The application uses event time characteristics
- The application shards (
keyBy
) events based on thesessionId
field - The application has windowing with 1 minute tumbling window
- The windowing is specified by a
reduce
and aprocess
functions - So, for each session we will have 1 computed record
- The windowing is specified by a
- The application emits the data into a Postgres sink
Application:
- It is hosted in AWS via Kinesis Data Analytics (KDA)
- It is running in 5 different regions
- The exact same code is running in each region
Database:
- It is hosted in AWS via RDS (currently it is a PostgreSQL)
- It is located in one region (with a read replica in a different region)
Because we are using event time characteristics with 1 minute tumbling window all regions' sink emit their records nearly at the same time.
What we want to achieve is to add artificial delay between window and sink operators to postpone sink emition.
Flink App Offset Window 1 Sink 1st run Window 2 Sink 2nd run #1 0 60 60 120 120 #2 12 60 72 120 132 #3 24 60 84 120 144 #4 36 60 96 120 156 #5 48 60 108 120 168 Not working work-aroundWe have thought that we can add some sleep to evictor's evictBefore
like this
ANSWER
Answered 2022-Mar-07 at 16:03You could use TumblingEventTimeWindows of(Time size, Time offset, WindowStagger windowStagger)
with WindowStagger.RANDOM
.
See https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/windowing/assigners/WindowStagger.html for documentation.
QUESTION
I'm not seeing how an AWS Kinesis Firehose lambda can send update and delete requests to ElasticSearch (AWS OpenSearch service).
Elasticsearch document APIs provides for CRUD operations: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
The examples I've found deals with the Create case, but doesn't show how to do delete
or update
requests.
https://aws.amazon.com/blogs/big-data/ingest-streaming-data-into-amazon-elasticsearch-service-within-the-privacy-of-your-vpc-with-amazon-kinesis-data-firehose/
https://github.com/amazon-archives/serverless-app-examples/blob/master/python/kinesis-firehose-process-record-python/lambda_function.py
The output format in the examples do not show a way to specify create
, update
or delete
requests:
ANSWER
Answered 2022-Mar-03 at 04:20Firehose uses lambda function to transform records before they are being delivered to the destination in your case OpenSearch(ES) so they are only used to modify the structure of the data but can't be used to influence CRUD actions. Firehose can only insert records into a specific index. If you need a simple option to remove records from ES index after a certain period of time have a look at "Index rotation" option when specifying destination for your Firehose stream.
If you want to use CRUD actions with ES and keep using Firehose I would suggest to send records to S3 bucket in the raw format and then trigger a lambda function on object upload event that will perform a CRUD action depending on fields in your payload.
A good example of performing CRUD actions against ES from lambda https://github.com/chankh/ddb-elasticsearch/blob/master/src/lambda_function.py
This particular example is built to send data from DynamoDB streams into ES but it should be a good starting point for you
QUESTION
I'm having a problem understanding how to preserve the order of events when consuming records from a Kinesis stream with Flink. Our setup looks like this:
- Kinesis stream with 8 shards
- Sharding key is the userId of the user that produced the event
In Flink, we use the Table API to consume the Kinesis stream, do some processing and write the events to a (custom) synchronous HTTP sink. The desired outcome would be that each shards processing subtask writes the events to the sink one after the other, waiting for the sink to return before writing the next event. In order to test that, we made sink functions randomly do a Thread.sleep()
for a few seconds before returning. Looking at the log output, we can now see this:
ANSWER
Answered 2022-Feb-16 at 21:33Given your requirements, the only way I can see to do this would be to bring all of the results for each user together so that they are written by the same instance of the sink.
Perhaps it would work to rewrite this as one large join (or union) on the user-id that you sort by timestamp. Or you might convert the results of the SQL queries into datastreams that you key by the user-id, and then implement some buffering and sorting in your custom sink.
QUESTION
I am evaluating different streaming/messaging services for use as an Event Bus. One of the dimensions I am considering is the ordering guarantee provided by each of the services. Two of the options that I am exploring are AWS Kinesis and Kafka and from a high level, according it looks like they both provide similar ordering guarantees where records are guaranteed to be consumable in the same order they were published only within that shard/partition.
It seems that AWS Kinesis APIs expose the ids of the parent shard(s) such that Consumer Groups using KCL can ensure records with the same partition key can be consumed in the order they were published (assuming a single threaded publisher) even if shards are being split and merged.
My question is, does Kafka provide any similar functionality such that records published with a specific key can be consumed in order even if partitions are added while messages are being published? From my reading, my understanding of partition selection (if you are specifying keys with your records) behaves along the lines of HASH(key) % PARTITION_COUNT
. So, if additional partitions are added, they partition where all messages with a specific key will be published may (and I've proven it does locally) change. Simultaneously, the Group Coordinator/Leader will reassign partition ownership among Consumers in Consumer Groups receiving records from that topic. But, after reassignment, there will be records (potentially unconsumed records) with the same key found in two different partitions. So, from the Consumer Group level is there no way to ensure that the unconsumed records with the same key now found in different partitions will be consumed in the order they were published?
I have very little experience with both these services, so my understanding may be flawed. Any advice is appreciated!
...ANSWER
Answered 2022-Jan-31 at 22:21My understanding was correct (as confirmed by @OneCricketeer and the documentation). Here is the relevant section of the documentation:
Although it’s possible to increase the number of partitions over time, one has to be careful if messages are produced with keys. When publishing a keyed message, Kafka deterministically maps the message to a partition based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition. This guarantee can be important for certain applications since messages within a partition are always delivered in order to the consumer. If the number of partitions changes, such a guarantee may no longer hold. To avoid this situation, a common practice is to over-partition a bit. Basically, you determine the number of partitions based on a future target throughput, say for one or two years later. Initially, you can just have a small Kafka cluster based on your current throughput. Over time, you can add more brokers to the cluster and proportionally move a subset of the existing partitions to the new brokers (which can be done online). This way, you can keep up with the throughput growth without breaking the semantics in the application when keys are used.
QUESTION
I have a Destination Stream
that looks as follows for example :
ANSWER
Answered 2022-Jan-31 at 06:13SQL - Array Aggregate
QUESTION
I am trying this in the AWS lambda console. I have installed npm install @aws-sdk/client-kinesis on my terminal and used zipped the file and created a lambda layer which has client-kinesis.
If use the following it works!
...ANSWER
Answered 2022-Jan-25 at 13:27- In order to make this work in the lambda console, you have to:
- Use NodeJS version 14 for your Lambda runtime
- We have to add
"type": "module"
to your package.json file in order to tell Node to use ES modules instead of traditional ES5 syntax.
- If there is a performance issue, it would be minimal that we don't have to worry about it! But you will have to test the compatibility of the ES5 modules that the lambda uses.
QUESTION
I am using dynamodb and I'd like to enable dynamodb stream to process any data change in the dynamodb table. By looking at the stream options, there are two streams Amazon Kinesis data stream and DynamoDB stream
. From the doc of these two streams, both are handling the data change from dynamodb table but I am not sure what the main different between using these two.
ANSWER
Answered 2021-Nov-01 at 07:34There are quite a few of the differences, which are listed in:
Few notable ones are that DynamoDB Streams
, unlike Kinesis Data Streams for DynamoDB
, guarantees no duplicates, the record retention time is only 24 hours, and the are throughout capacity limits.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install kinesis
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page