s3hook | Transparent Client-side S3 Request | HTTP Client library

by jpillora JavaScript Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | s3hook Summary

s3hook is a JavaScript library typically used in Utilities, HTTP Client, Ethereum, Axios applications. s3hook has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

Transparent Client-side S3 Request Signing

Support

Quality

Security

License

Reuse

Support

s3hook has a low active ecosystem.

It has 19 star(s) with 5 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

s3hook has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of s3hook is current.

Quality

s3hook has no bugs reported.

Security

s3hook has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

s3hook does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

s3hook releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of s3hook

Get all kandi verified functions for this library.

s3hook Key Features

No Key Features are available at this moment for s3hook.

s3hook Examples and Code Snippets

No Code Snippets are available at this moment for s3hook.

Community Discussions

Trending Discussions on s3hook

Airflow 2.0.0+ - Pass a Dynamically Generated Dictionary to DAG Triggered by TriggerDagRunOperator

Get only the filename from s3 using s3hook

Airflow: How to implement Dynamic html_content

Airflow S3 ClientError - Forbidden: Wrong s3 connection settings using UI

The conn_id isn't defined

trying to create dynamic subdags from parent dag based on array of filenames

URI Format for Creating an Airflow S3 Connection via Environment Variables

setting up s3 for logs in airflow

Can't get Apache Airflow to write to S3 using EMR Operators

Fusing operators together

QUESTION

Airflow 2.0.0+ - Pass a Dynamically Generated Dictionary to DAG Triggered by TriggerDagRunOperator

Asked 2021-Jun-11 at 19:20

Previously, I was using the python_callable parameter of the TriggerDagRunOperator to dynamically alter the dag_run_obj payload that is passed to the newly triggered DAG.

Since its removal in Airflow 2.0.0 (Pull Req: https://github.com/apache/airflow/pull/6317), is there a way to do this, without creating a custom TriggerDagRunOperator?

For context, here is the flow of my code:

...

ANSWER

Answered 2021-Jun-11 at 19:20

The TriggerDagRunOperator now takes a conf parameter to which a dictinoary can be provided as the conf object for the DagRun. Here is more information on triggering DAGs which you may find helpful as well.

EDIT

Since you need to execute a function to determine which DAG to trigger and do not want to create a custom TriggerDagRunOperator, you could execute intakeFile() in a PythonOperator (or use the @task decorator with the Task Flow API) and use the return value as the conf argument in the TriggerDagRunOperator. As part of Airflow 2.0, return values are automatically pushed to XCom within many operators; the PythonOperator included.

Here is the general idea:

Source https://stackoverflow.com/questions/67941237

QUESTION

Get only the filename from s3 using s3hook

Asked 2021-Apr-02 at 18:06

I'm creating the below class that is based off the s3CopyObjectOperator, but I have to copy all the files from an s3 directory and save to another directory, then delete the files.

But I need the file names from the directory I'm copying from. So lets say the Copy Source is:

...

ANSWER

Answered 2021-Apr-02 at 18:06

S3 is an object store and the "path" is really part of the name. You can think of it as a prefix to the base file name.

Assuming you have the destination prefix you want to append to the filename, you can build the destination key for each s3 key you found.

Source https://stackoverflow.com/questions/66922255

QUESTION

Airflow: How to implement Dynamic html_content

Asked 2020-Jun-16 at 08:16

I need to implement the html_content dynamic for custom email operator, as we have html_content different for different jobs.

Also, I need the values, for example, rows and filename be dynamic

The example below is one of the email body:

...

ANSWER

Answered 2020-Jun-16 at 08:16

Airflow support Jinja templating in operators. It is build into the BaseOperator and controlled by the template_fields and template_ext fields of the base operator, e.g.:

Source https://stackoverflow.com/questions/62400278

QUESTION

Airflow S3 ClientError - Forbidden: Wrong s3 connection settings using UI

Asked 2020-Jun-09 at 07:10

I'm using S3Hook in my task to download files from s3 bucket on DigitalOcean spaces. Here is an example of credentials which are perfectry working with boto3, but causing errors when used in S3Hook:

...

ANSWER

Answered 2020-Jun-09 at 07:10

Moving host variable to Extra did the trick for me.

For some reason, airflow is unable to establish connection in case of custom S3 host (different from AWS, like DigitalOcean) if It's not in Extra vars.

Also, region_name can be removed from Extra in case like mine.

Source https://stackoverflow.com/questions/62257828

QUESTION

The conn_id isn't defined

Asked 2020-May-25 at 19:07

I'm learning Airflow and I'm trying to understand how connections works.

I have a first dag with the following code:

...

ANSWER

Answered 2020-May-25 at 19:07

Connections are usually created using the UI or CLI as described here and stored by Airflow in the database backend. The operators and the respective hooks then take a connection ID as an argument and use it to retrieve the usernames, passwords, etc. for those connections.

In your case, I suspect you created a connection with the ID aws_credentials using the UI or CLI. So, when you pass its ID to S3Hook it successfully retrieves the credentials (from the databes, not from the Connection object that you created).

But, you did not create a connection with the ID redshift, therefore, AwsHook complains that it is not defined. You have to create the connection as described in the documentation first.

Note: The reason for not defining connections in the DAG code is that the DAG code is usually stored in a version control system (e.g., Git). And it would be a security risk to store credentials there.

Source https://stackoverflow.com/questions/61945995

QUESTION

trying to create dynamic subdags from parent dag based on array of filenames

Asked 2020-Feb-25 at 03:48

I am trying to move s3 files from a "non-deleting" bucket (meaning I can't delete the files) to GCS using airflow. I cannot be guaranteed that new files will be there everyday, but I must check for new files everyday.

my problem is the dynamic creation of subdags. If there ARE files, I need subdags. If there are NOT files, I don't need subdags. My problem is the upstream/downstream settings. In my code, it does detect files, but does not kick off the subdags as they are supposed to. I'm missing something.

here's my code:

...

ANSWER

Answered 2020-Feb-25 at 03:48

Below is the recommended way to create a dynamic DAG or sub-DAG in airflow, though there are other ways also, but I guess this would be largely applicable to your problem.

First, create a file (yaml/csv) which includes the list of all s3 files and locations, in your case you have written a function to store them in list, I would say store them in a separate yaml file and load it at run time in airflow env and then create DAGs.

Below is a sample yaml file: dynamicDagConfigFile.yaml

Source https://stackoverflow.com/questions/60270233

QUESTION

URI Format for Creating an Airflow S3 Connection via Environment Variables

Asked 2020-Jan-10 at 14:45

I've read the documentation for creating an Airflow Connection via an environment variable and am using Airflow v1.10.6 with Python3.5 on Debian9.

The linked documentation above shows an example S3 connection of s3://accesskey:secretkey@S3 From that, I defined the following environment variable:

AIRFLOW_CONN_AWS_S3=s3://#MY_ACCESS_KEY#:#MY_SECRET_ACCESS_KEY#@S3

And the following function

...

ANSWER

Answered 2020-Jan-10 at 14:45

Found the issue, s3://accesskey:secretkey@S3 is the correct format, the problem was my aws_secret_access_key had a special character in it and had to be urlencoded. That fixed everything.

Source https://stackoverflow.com/questions/59671864

QUESTION

setting up s3 for logs in airflow

Asked 2019-Sep-13 at 11:08

I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/

My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this

...

ANSWER

Answered 2017-Jun-28 at 07:33

You need to set up the s3 connection through airflow UI. For this, you need to go to the Admin -> Connections tab on airflow UI and create a new row for your S3 connection.

An example configuration would be:

Conn Id: my_conn_S3

Conn Type: S3

Extra: {"aws_access_key_id":"your_aws_key_id", "aws_secret_access_key": "your_aws_secret_key"}

Source https://stackoverflow.com/questions/44780736

QUESTION

Can't get Apache Airflow to write to S3 using EMR Operators

Asked 2019-Sep-12 at 16:21

I am using the Airflow EMR Operators to create an AWS EMR Cluster that runs a Jar file contained in S3 and then writes the output back to S3. It seems to be able to run the job using the Jar file from S3, but I cannot get it to write the output to S3. I am able to get it to write the output to S3 when running it as an AWS EMR CLI Bash command, but I need to do it using the Airflow EMR Operators. I have the S3 output directory set both in the Airflow step config and in the environment config in the Jar file and still cannot get the Operators to write to it.

Here is the code I have for my Airflow DAG

...

ANSWER

Answered 2019-Sep-12 at 16:21

I believe that I just solved my problem. After really digging deep into all the local Airflow logs and the S3 EMR logs I found a Hadoop Memory Exception, so I increased the number of cores to run the EMR on and it seems to work now.

Source https://stackoverflow.com/questions/57896778

QUESTION

Fusing operators together

Asked 2019-Sep-07 at 11:47

I'm still in the process of deploying Airflow and I've already felt the need to merge operators together. The most common use-case would be coupling an operator and the corresponding sensor. For instance, one might want to chain together the EmrStepOperator and EmrStepSensor.

I'm creating my DAGs programmatically, and the biggest one of those contains 150+ (identical) branches, each performing the same series of operations on different bits of data (tables). Therefore clubbing together tasks that make-up a single logical step in my DAG would be of great help.

Here are 2 contending examples from my project to give motivation for my argument.

1. Deleting data from S3 path and then writing new data

This step comprises 2 operators

DeleteS3PathOperator: Extends from BaseOperator & uses S3Hook
HadoopDistcpOperator: Extends from SSHOperator

2. Conditionally performing MSCK REPAIR on Hive table

This step contains 4 operators

BranchPythonOperator: Checks whether Hive table is partitioned
MsckRepairOperator: Extends from HiveOperator and performs MSCK REPAIR on (partioned) table
Dummy(Branch)Operator: Makes up alternate branching path to MsckRepairOperator (for non-partitioned tables)
Dummy(Join)Operator: Makes up the join step for both branches

Using operators in isolation certainly offers smaller modules and more fine-grained logging / debugging, but in large DAGs, reducing the clutter might be desirable. From my current understanding there are 2 ways to chain operators together

Hooks

Write actual processing logic in hooks and then use as many hooks as you want within a single operator (Certainly the better way in my opinion)
SubDagOperator

A risky and controversial way of doing things; additionally the naming convention for SubDagOperator makes me frown.

My questions are

Should operators be composed at all or is it better to have discrete steps?
Any pitfalls, improvements in above approaches?
Any other ways to combine operators together?
In taxonomy of Airflow, is the primary motive of Hooks same as above, or do they serve some other purposes too?

UPDATE-1

3. Multiple Inhteritance

While this is a Python feature rather than Airflow specific, its worthwhile to point out that multiple inheritance can come handy in combining functionalities of operators. QuboleCheckOperator, for instance, is already written using that. However in the past, I've tried this thing to fuse EmrCreateJobFlowOperator and EmrJobFlowSensor, but at the time I had run into issues with @apply_defaults decorator and had abandoned the idea.

...

ANSWER

Answered 2018-Nov-14 at 22:05

I have combined various hooks to create a Single operator based on my needs. A simple example is I clubbed gcs delete, copy, list method and get_size methods in hook to create a single operator called GcsDataValidationOperator. A rule of thumb would be to have Idempotency i.e. if you run multiple times it should produce the same result.

Should operators be composed at all or is it better to have discrete steps?

The only pitfall is maintainability, sometimes when the hooks change in the master branch, you will need to update all your operator manually if there are any breaking changes.

Any pitfalls, improvements in above approaches?

You can use PythonOperator and use the in-built hooks with .execute method, but it would still mean a lot of details in the DAG file. Hence, I would still go for a new operator approach

Any other ways to combine operators together?

Hooks are just interfaces to external platforms and databases like Hive, GCS, etc and form building blocks for operators. This allows the creation of new operators. Also, this mean you can customize templated field, add slack notification on each granular step inside your new operator and have your own logging details.

In taxonomy of Airflow, is the primary motive of Hooks same as above, or do they serve some other purposes too?

FWIW: I am the PMC member and a contributor of the Airflow project.

Source https://stackoverflow.com/questions/53308306

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install s3hook

Development s3hook.js 36KB
Production s3hook.min.js 16KB (5KB Gzip)

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: