s3select | s3select makes s3 select querying API | SQL Database library
kandi X-RAY | s3select Summary
kandi X-RAY | s3select Summary
s3select makes s3 select querying API much easier and faster
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of s3select
s3select Key Features
s3select Examples and Code Snippets
Community Discussions
Trending Discussions on s3select
QUESTION
I'm using Spark 2.4.5 running on AWS EMR 5.30.0 with r5.4xlarge instances (16 vCore, 128 GiB memory, EBS only storage, EBS Storage:256 GiB) : 1 master, 1 core and 30 task.
I launched Spark Thrift Server on the master node and it's the only job that is running on the cluster
...ANSWER
Answered 2020-Jul-06 at 21:21The problem was having only 1 core instance as the logs were saved in HDFS so this instance became a bottleneck. I added another core instance and it's going much better now.
Another solution could be to save the logs to S3/S3A instead of HDFS, changing those parameters in spark-defaults.conf (make sure they are changed in the UI config too) but it might require adding some JAR files to work.
QUESTION
Setup: latest (5.29) AWS EMR, spark, 1 master 1 node.
step 1. I have used S3Select to parse a file & collect all file keys for pulling from S3. step 2. Use pyspark iterate the keys in a loop and do the following
spark .read .format("s3selectCSV") .load(key) .limit(superhighvalue) .show(superhighvalue)
It took be x number of minutes.
When I increase the cluster to 1 master and 6 nodes, I am not seeing difference in time. It appears to me that I am not using the increased core nodes.
Everything else, config wise are defaults out of the box, I am not setting anything.
So, my question is does cluster size matters to read and inspect (say log or print) data from S3 using EMR, Spark?
...ANSWER
Answered 2020-Feb-04 at 06:41Few thing to keep in mind.
- are you sure that the executors have indeed increased because of
increase of nodes? or u can specify them during spark submit
--num-executors 6
. MOre nodes doenst mean nore executors are spinned. - next thing, wht is the size of csv file? some 1MB? then u will not see much difference. Make sure to have atleast 3-4 GB
QUESTION
I have a strange issue while querying from Presto (AWS EMR). I was using Presto 0.194 and everything was ok, after I upgraded to 0.224, I cannot run my queries. I'm using LDAP authentication for presto and also file base authorization for Hive using a authorization.json file. I'm using the same json file which was working fine in the old version. Any help would highly appreciated.
Error: Query 20191005_104119_00006_3snge failed: Access Denied: View owner 'username' cannot create view that selects from ...
config.propertis:
...ANSWER
Answered 2019-Oct-05 at 13:17Error: Query 20191005_104119_00006_3snge failed: Access Denied: View owner 'username' cannot create view that selects from ...
This means that username
does not have GRANT_SELECT
privilege on a particular table or tables.
The particular change that affects you went in in 0.199 release: https://github.com/prestosql/presto/commit/6ed1ed88083baef1d29171364297631962adf05d This was a bug fix (creating view should require different privileges), so it is intentional (although inconvenient) that the change did not maintain backward compatibility.
BTW
For one-time troubleshooting-style questions which are unlikely to be beneficial for SO community I recommend using #troubleshooting
channel on Presto Community Slack
QUESTION
I am using S3 Select to read csv file from S3 Bucket and outputting as CSV. In the output I only see rows, but not headers. How do I get output with headers included.
...ANSWER
Answered 2018-Jun-14 at 02:49Amazon S3 Select will not output headers.
In your code, you could just include a print
command to output the headers before looping through the results.
QUESTION
I'm trying to catch the data form a S3 object. I'm using a S3 Select feature as below:
boto3 version : 1.7.59
...ANSWER
Answered 2018-Jul-18 at 20:20Looks like the SQL expression you're passing is invalid:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install s3select
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page