PyAthena | Python DB API 2.0 ( PEP | AWS library

by laughingman7743 Python Version: 3.8.3 License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | PyAthena Summary

PyAthena is a Python library typically used in Cloud, AWS, Amazon S3 applications. PyAthena has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However PyAthena build file is not available. You can install using 'pip install PyAthena' or download it from GitHub, PyPI.

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.

Support

Quality

Security

License

Reuse

Support

PyAthena has a low active ecosystem.

It has 406 star(s) with 85 fork(s). There are 6 watchers for this library.

There were 5 major release(s) in the last 12 months.

There are 14 open issues and 179 have been closed. On average issues are closed in 91 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of PyAthena is 3.8.3

Quality

PyAthena has 0 bugs and 0 code smells.

Security

PyAthena has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

PyAthena code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

PyAthena is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

PyAthena releases are available to install and integrate.

Deployable package is available in PyPI.

PyAthena has no build file. You will be need to create the build yourself to build the component from source.

It has 6502 lines of code, 527 functions and 32 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed PyAthena and discovered the below as its top functions. This is intended to give you an instant insight into PyAthena implemented functionality, and help decide if they suit your requirements.

Execute an operation
Find the query id for the query
List query executions
Builds a start query execution context
Generate CREATE TABLE statement
Escape a comment
Get bucket count
Prepare columns for create_columns
Get the specification for a column
Read the data as a Pandas DataFrame
Execute given operation
Fetch all rows from the database
Fetch all rows
Execute the given operation
Fetch multiple rows from the database
Executes the given operation with executemany
Convert a Pandas DataFrame to parquet format
Assume role
Format a query string
Collects the results from the query
Fetch multiple rows
Convert a schema element into a tuple
Collects the results of a query
Get session token
Run a pyathena pandas query
Create connection arguments
Run a PyAthenaql query
Run a pyathena query

Get all kandi verified functions for this library.

PyAthena Key Features

No Key Features are available at this moment for PyAthena.

PyAthena Examples and Code Snippets

No Code Snippets are available at this moment for PyAthena.

Community Discussions

Trending Discussions on PyAthena

SQLAlchemy 'Connection' object has no attribute '_Inspector__engine'

How to obtain table relations (primary and foreign keys) of a database stored in AWS?

Pyathena is super slow compared to querying from Athena

How to add external library in a glue job using python shell

Which one is faster for querying Athena: pyathena or boto3?

Pyathena "s3_staging_dir" file - how can I get this filename to use it?

Unable to read data from AWS Glue Database/Tables using Python

How do I handle errors and retry in PyAthena?

Where does entry_point script is stored in custom Sagemaker Framework training job container?

How to loop query in pyathena?

QUESTION

SQLAlchemy 'Connection' object has no attribute '_Inspector__engine'

Asked 2021-Jul-19 at 11:53

I want to use SQLAlchemy to read databases, which are not mapped to Objects (needs to access DBs unknown at time of development). One of the functionalities is to read the column names of different tables. Therefore I wrote this Connector:

MyConnector.py

...

ANSWER

Answered 2021-Jul-19 at 11:53

Too bad I cannot give anyone a better answer than, that I suspect something with the dependencies was messed up... I wanted to try different versions of SQLAlchemy to write a bug report wit the above described behavior. Therefore I changed my venv a couple of times via the commands:

Source https://stackoverflow.com/questions/68409082

QUESTION

How to obtain table relations (primary and foreign keys) of a database stored in AWS?

Asked 2021-Mar-15 at 15:55

I want to show the relations between tables in a database stored in Amazon Web Services. My database name is news. From this answer, I run this Python code in Amazon SageMaker

...

ANSWER

Answered 2021-Mar-15 at 15:55

There is no such table as INFORMATION_SCHEMA.TABLE_CONSTRAINTS in awsdatacatalog. Also, Amazon Athena doesn't support Primary Keys or Foreign Keys.

Here is a list of things it supports while creating a table:

https://docs.aws.amazon.com/athena/latest/ug/create-table.html

Source https://stackoverflow.com/questions/66589101

QUESTION

Pyathena is super slow compared to querying from Athena

Asked 2020-Dec-03 at 21:04

I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming?

What could I do to speed this up?

...

ANSWER

Answered 2020-Dec-03 at 21:04

Just figure out a way of boosting the queries:

Before I was trying:

Source https://stackoverflow.com/questions/64170759

QUESTION

How to add external library in a glue job using python shell

Asked 2020-Nov-23 at 19:11

I tried to run a Glue job in python-shell by adding external dependencies (like pyathena, pytest,etc ..) as python egg file/ whl file in the job configurations as mentioned in the AWS documentation https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html.

The Glue job is configured under VPC having no internet and its execution resulted in the below error.

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to pypi.org timed out. (connect timeout=15)')'

I even tried modifying my python script with the below code

...

ANSWER

Answered 2020-Sep-10 at 11:28

Refer to this doc which has steps in detail for packaging a python library. Also make sure that your VPC has s3 endpoint enter link description here as traffic will not leave AWS network when you run a Glue job inside VPC.

Source https://stackoverflow.com/questions/63814407

QUESTION

Which one is faster for querying Athena: pyathena or boto3?

Asked 2020-Oct-10 at 09:38

Which one is faster pyathena or boto3 to query AWS Athena schemas using python script?

Currently I am using pyathena to query Athena schemas but it's quite slow and I know there is another option of boto3 but before starting need some experts advice.

...

ANSWER

Answered 2020-Oct-10 at 09:02

Looking at the dependencies for PyAthena you can see that it actually have a dependency of boto3.

Unless PyAthena has added a lot of overhead to its library which is unlikely, the best performance improvements you're likely to see will depend on how you're using Athena itself.

There are many performance improvements you can make, Amazon published a blog named Top 10 Performance Tuning Tips for Amazon Athena which will help to improve the performance of your queries.

Source https://stackoverflow.com/questions/64291515

QUESTION

Pyathena "s3_staging_dir" file - how can I get this filename to use it?

Asked 2020-Sep-02 at 17:47

I'm using Pyathena to run basic queries:

...

ANSWER

Answered 2020-Sep-02 at 17:47

OK, once I learned that the filename isn't random, but rather is Athena's query ID, I was able to do a better search and find a solution. Using the object I've already created above:

Source https://stackoverflow.com/questions/63710242

QUESTION

Unable to read data from AWS Glue Database/Tables using Python

Asked 2020-Aug-31 at 06:37

My requirement is to use python script to read data from AWS Glue Database into a dataframe. When I researched I fought the library - "awswrangler". I'm using the below code to connect and read data:

...

ANSWER

Answered 2020-Aug-27 at 06:53

Use following code in python to get data what you are looking for.

Source https://stackoverflow.com/questions/63606658

QUESTION

How do I handle errors and retry in PyAthena?

Asked 2020-Jun-08 at 20:34

I have an Athena query that I run every day from my local Ubuntu machine. It runs fine most times.

...

ANSWER

Answered 2020-Jun-08 at 20:34

You are calling the function get_athena_data and passing its return to the function retry, not the function.

Try it this way: retry(get_athena_data).

(UPDATED) Now passing some args:

Source https://stackoverflow.com/questions/62269414

QUESTION

Where does entry_point script is stored in custom Sagemaker Framework training job container?

Asked 2020-May-25 at 20:07

I am trying to create my own custom Sagemaker Framework that runs a custom python script to train a ML model using the entry_point parameter.

Following the Python SDK documentation (https://sagemaker.readthedocs.io/en/stable/estimators.html), I wrote the simplest code to run a training job just to see how it behaves and how Sagemaker Framework works.

My problem is that I don't know how to properly build my Docker container in order to run the entry_point script.

I added the train.py script into the container that only logs the folders and files paths as well as the variables in the containers environment.

I was able to run the training job, but I couldn't find any reference of the entry_point script neither in environment variable nor the files in the container.

Here is the code I used:

Custom Sagemaker Framework Class:

...

ANSWER

Answered 2020-May-25 at 19:39

SageMaker team created a python package sagemaker-training to install in your docker so that your customer container will be able to handle external entry_point scripts. See here for an example using Catboost that does what you want to do :)

https://github.com/aws-samples/sagemaker-byo-catboost-container-demo

Source https://stackoverflow.com/questions/62007961

QUESTION

How to loop query in pyathena?

Asked 2020-May-01 at 15:21

I am using pyathena library to query schemas and storing it in pandas dataframe. I've a list which contains atleast 30,000 items.

eg. l1 = [1,2,3,4..... 29999,30000]

Now I want to pass this list items in sql query. Since I cannot pass all 30,000 list items at a time, therefore, I divided list into 30 chunks and passing each chunk in loop, as shown below:

Note: I tried it to divide it in fewer chunks but 1000 items per chunks seems best option.

...

ANSWER

Answered 2020-May-01 at 15:21

Try this:

Source https://stackoverflow.com/questions/61245102

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install PyAthena

You can install using 'pip install PyAthena' or download it from GitHub, PyPI.
You can use PyAthena like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: