PyAthena | Python DB API 2.0 ( PEP | AWS library
kandi X-RAY | PyAthena Summary
kandi X-RAY | PyAthena Summary
PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Execute an operation
- Find the query id for the query
- List query executions
- Builds a start query execution context
- Generate CREATE TABLE statement
- Escape a comment
- Get bucket count
- Prepare columns for create_columns
- Get the specification for a column
- Read the data as a Pandas DataFrame
- Execute given operation
- Fetch all rows from the database
- Fetch all rows
- Execute the given operation
- Fetch multiple rows from the database
- Executes the given operation with executemany
- Convert a Pandas DataFrame to parquet format
- Assume role
- Format a query string
- Collects the results from the query
- Fetch multiple rows
- Convert a schema element into a tuple
- Collects the results of a query
- Get session token
- Run a pyathena pandas query
- Create connection arguments
- Run a PyAthenaql query
- Run a pyathena query
PyAthena Key Features
PyAthena Examples and Code Snippets
Community Discussions
Trending Discussions on PyAthena
QUESTION
I want to use SQLAlchemy to read databases, which are not mapped to Objects (needs to access DBs unknown at time of development). One of the functionalities is to read the column names of different tables. Therefore I wrote this Connector:
MyConnector.py
...ANSWER
Answered 2021-Jul-19 at 11:53Too bad I cannot give anyone a better answer than, that I suspect something with the dependencies was messed up... I wanted to try different versions of SQLAlchemy to write a bug report wit the above described behavior. Therefore I changed my venv a couple of times via the commands:
QUESTION
I want to show the relations between tables in a database stored in Amazon Web Services. My database name is news
. From this answer, I run this Python code in Amazon SageMaker
ANSWER
Answered 2021-Mar-15 at 15:55There is no such table as INFORMATION_SCHEMA.TABLE_CONSTRAINTS
in awsdatacatalog
.
Also, Amazon Athena doesn't support Primary Keys or Foreign Keys.
Here is a list of things it supports while creating a table:
https://docs.aws.amazon.com/athena/latest/ug/create-table.html
QUESTION
I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming?
What could I do to speed this up?
...ANSWER
Answered 2020-Dec-03 at 21:04Just figure out a way of boosting the queries:
Before I was trying:
QUESTION
I tried to run a Glue job in python-shell by adding external dependencies (like pyathena, pytest,etc ..) as python egg file/ whl file in the job configurations as mentioned in the AWS documentation https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html.
The Glue job is configured under VPC having no internet and its execution resulted in the below error.
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to pypi.org timed out. (connect timeout=15)')'
I even tried modifying my python script with the below code
...ANSWER
Answered 2020-Sep-10 at 11:28Refer to this doc which has steps in detail for packaging a python library. Also make sure that your VPC has s3 endpoint enter link description here as traffic will not leave AWS network when you run a Glue job inside VPC.
QUESTION
Which one is faster pyathena
or boto3
to query AWS Athena schemas using python script?
Currently I am using pyathena to query Athena schemas but it's quite slow and I know there is another option of boto3 but before starting need some experts advice.
...ANSWER
Answered 2020-Oct-10 at 09:02Looking at the dependencies for PyAthena you can see that it actually have a dependency of boto3.
Unless PyAthena has added a lot of overhead to its library which is unlikely, the best performance improvements you're likely to see will depend on how you're using Athena itself.
There are many performance improvements you can make, Amazon published a blog named Top 10 Performance Tuning Tips for Amazon Athena which will help to improve the performance of your queries.
QUESTION
I'm using Pyathena to run basic queries:
...ANSWER
Answered 2020-Sep-02 at 17:47OK, once I learned that the filename isn't random, but rather is Athena's query ID, I was able to do a better search and find a solution. Using the object I've already created above:
QUESTION
My requirement is to use python script to read data from AWS Glue Database into a dataframe. When I researched I fought the library - "awswrangler". I'm using the below code to connect and read data:
...ANSWER
Answered 2020-Aug-27 at 06:53Use following code in python to get data what you are looking for.
QUESTION
I have an Athena query that I run every day from my local Ubuntu machine. It runs fine most times.
...ANSWER
Answered 2020-Jun-08 at 20:34You are calling the function get_athena_data
and passing its return to the function retry
, not the function.
Try it this way: retry(get_athena_data)
.
(UPDATED) Now passing some args:
QUESTION
I am trying to create my own custom Sagemaker Framework that runs a custom python script to train a ML model using the entry_point parameter.
Following the Python SDK documentation (https://sagemaker.readthedocs.io/en/stable/estimators.html), I wrote the simplest code to run a training job just to see how it behaves and how Sagemaker Framework works.
My problem is that I don't know how to properly build my Docker container in order to run the entry_point script.
I added the train.py
script into the container that only logs the folders and files paths as well as the variables in the containers environment.
I was able to run the training job, but I couldn't find any reference of the entry_point script neither in environment variable nor the files in the container.
Here is the code I used:
- Custom Sagemaker Framework Class:
ANSWER
Answered 2020-May-25 at 19:39SageMaker team created a python package sagemaker-training
to install in your docker so that your customer container will be able to handle external entry_point
scripts.
See here for an example using Catboost that does what you want to do :)
https://github.com/aws-samples/sagemaker-byo-catboost-container-demo
QUESTION
I am using pyathena library to query schemas and storing it in pandas dataframe. I've a list which contains atleast 30,000 items.
eg. l1 = [1,2,3,4..... 29999,30000]
Now I want to pass this list items in sql query. Since I cannot pass all 30,000 list items at a time, therefore, I divided list into 30 chunks and passing each chunk in loop, as shown below:
Note: I tried it to divide it in fewer chunks but 1000 items per chunks seems best option.
...ANSWER
Answered 2020-May-01 at 15:21Try this:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install PyAthena
You can use PyAthena like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page