presto | Fast Wilcoxon and auROC | Machine Learning library

by immunogenomics Jupyter Notebook Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | presto Summary

presto is a Jupyter Notebook library typically used in Artificial Intelligence, Machine Learning, Pytorch applications. presto has no bugs and it has low support. However presto has 1 vulnerabilities. You can download it from GitHub.

Presto performs a fast Wilcoxon rank sum test and auROC analysis. Latest benchmark ran 1 million observations, 1K features, and 10 groups in 16 seconds (sparse input) and 85 seconds (dense input).

Support

Quality

Security

License

Reuse

Support

presto has a low active ecosystem.

It has 94 star(s) with 23 fork(s). There are 14 watchers for this library.

It had no major release in the last 6 months.

There are 12 open issues and 6 have been closed. On average issues are closed in 63 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of presto is current.

Quality

presto has no bugs reported.

Security

presto has 1 vulnerability issues reported (0 critical, 1 high, 0 medium, 0 low).

License

presto does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

presto releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of presto

Get all kandi verified functions for this library.

presto Key Features

No Key Features are available at this moment for presto.

presto Examples and Code Snippets

No Code Snippets are available at this moment for presto.

Community Discussions

Trending Discussions on presto

Presto sql function date_parse fails for specific date (1960-01-01)

Error while converting timestamp string with timezone (+0000) to Timestamp in Presto

I am getting the following error , line 25:8: Column 'flag1' cannot be resolved

RDBMS Resource Usage when Using PrestoDB

data distribution with spark sql

Syntactical use of WITH in SQL for CTEs and Properties

In Presto SQL how to create a map of array values and its count

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

What is meant by "query data where it lives" with Presto?

Combine row aggregate data with individual rows

QUESTION

Presto sql function date_parse fails for specific date (1960-01-01)

Asked 2021-Jun-15 at 18:56

How to resolve this presto sql error for date_parse('1960-01-01', '%Y-%m-%d')

This function works fine for other dates.

...

ANSWER

Answered 2021-Jun-15 at 18:56

This is due to a long-standing issue with how Presto models timestamps. Long story short, the implementation of timestamps is not compliant with the SQL specification and it incorrectly attempts to treat them as "point in time" or "instant" values and interpret them within a time zone specification. For some dates and time zone rules, the values are undefined due to daylight savings transitions, etc.

This was fixed in recent versions of Trino (formerly known as Presto SQL), so you may want to update.

By the way, you can convert a varchar to a date using the date() function or by casting the value to date:

Source https://stackoverflow.com/questions/67991582

QUESTION

Error while converting timestamp string with timezone (+0000) to Timestamp in Presto

Asked 2021-Jun-14 at 20:20

I am trying to convert a timestamp string to timestamp with date_parse, but keep getting an error. Any suggestions? I am working on Presto SQL. I also refered: http://teradata.github.io/presto/docs/127t/functions/datetime.html, but couldnt find anything that can deal with +0000 i.e Timezone.

I tried:

...

ANSWER

Answered 2021-Jun-14 at 20:20

This works!:

Source https://stackoverflow.com/questions/67959546

QUESTION

I am getting the following error , line 25:8: Column 'flag1' cannot be resolved

Asked 2021-Jun-08 at 15:15

I wrote the following query in presto,which gave the error :line 25:8: Column 'flag1' cannot be resolved. The flag condition has to be incorporated. I had run a similar query on redshift without any issue.

...

ANSWER

Answered 2021-Jun-08 at 06:11

Consider changing WHERE flag1 = 'New' to WHERE date_diff ('day',fod,dt) <= 28

Source https://stackoverflow.com/questions/67882222

QUESTION

RDBMS Resource Usage when Using PrestoDB

Asked 2021-Jun-01 at 07:45

When we querying mysql database using presto, is it means that we still using mysql’s resource like cpu or ram or not?. Thank you

...

ANSWER

Answered 2021-Jun-01 at 07:45

This from the faq explain your question

If the Presto process is mostly idle, this means that Presto can not retrieve data fast enough from the HDFS data node. This could be caused by network or disk bandwidth or CPU on the data node.

As you see presto is a separate process that waits that the database sends the requested data. so the database has to "work" as normal

Source https://stackoverflow.com/questions/67771753

QUESTION

data distribution with spark sql

Asked 2021-May-31 at 09:13

I'm quite new to spark SQL. I struggle to combine operations properly. What I want can be a bit tricky:

WHAT I HAVE

From values :

...

ANSWER

Answered 2021-May-31 at 09:06

You can create the map using group by and map_from_entries:

Source https://stackoverflow.com/questions/67770658

QUESTION

Syntactical use of WITH in SQL for CTEs and Properties

Asked 2021-May-22 at 23:00

I know two uses of WITH in SQL:

To signify a CTE (Common Table Expression) clause, creating a temporary table for use in the present query, and
To dictate properties in a CTAS (CREATE TABLE AS) statement, e.g. Presto, AWS Athena, Cloudera, etc.

However, in reading long queries, I have on several occasions had diffculty immediately telling these two uses apart, and I always thought to myself if it would have made more sense to use another word for one of the two, to improve readability and avoid ambiguity.

So my question is: are these two uses related somehow? Do they stem from some common root?

...

ANSWER

Answered 2021-May-22 at 23:00

They are not related at all. WITH is a syntactic construct similar to a subquery. The other is used for other purposes.

An analogy by might the BY in GROUP BY and ORDER BY. Or the AND used for BETWEEN and as a stand-alone boolean operator. They just happen to have the same name.

Source https://stackoverflow.com/questions/67654785

QUESTION

In Presto SQL how to create a map of array values and its count

Asked 2021-May-21 at 16:16

If I have a table like below, how do I get the map of the count of unique values in the arrays in column2?

ID Column1 Column2 1 10 [a, a, b, c] 2 12 [a, a, a]

I would like something like the below:

ID Column1 Column2 1 10 {a: 2, b: 1, c: 1} 2 12 {a: 3}

I tried to use Presto's [histogram][1] for this. But it is an aggregate function that requires group by. I need to use the histogram for each row and not the entire table.

For example,

...

ANSWER

Answered 2021-May-21 at 16:16

You can use unnest to expand your array into a column and then use histogram over this new column:

Source https://stackoverflow.com/questions/67626227

QUESTION

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

Asked 2021-May-20 at 04:13

I am trying to query pinot table data using presto, below are my configuration details.

...

ANSWER

Answered 2021-May-20 at 04:13

Update: This is because the connector does not support mixed case table names. Mixed case column names are supported. There is a pull request to add support for mixed case table names: https://github.com/trinodb/trino/pull/7630

Source https://stackoverflow.com/questions/67603729

QUESTION

What is meant by "query data where it lives" with Presto?

Asked 2021-May-18 at 20:31

I saw this on a Presto tutorial and it says the benefit is "query data where it lives".

What is meant by that? I'd love a comparison to the traditional v. Presto version of things.

edit: adding context by linking to quote on homepage

https://prestodb.io/ under "What can it Do?"

...

ANSWER

Answered 2021-May-18 at 20:28

TL;DR: Query the data where it lives is a quick way of saying you don't need to move the data from other databases into one database in order to run queries across all of your data. In other words, Presto can act as a hub to query multiple databases, and perform further processing on the data using standard ANSI SQL.

One use case that I ran into in my last company was we needed to have a standard way to access data from our Elasticsearch cluster and our data lake (Hive/HDFS) and combine those two data sources. The only difference is we used Trino rather than Presto, since Trino is the fork that the creators of Presto now maintain. The examples still applies to both.

Elasticsearch stores data in an Apache Lucene index and is really only accessible through Elasticsearch clients which derive from the Elasticsearch query DSL.

Hive's data is generally stored in an open file format (ORC, JSON, AVRO, or Parquet), and resides in a distributed filesystem like HDFS or S3 cloud storage solutions. You query it via HiveQL which is kind of like SQL but a special dialect.

We had to write and maintain a lot of code to interface with both of these systems, especially to maintain the models that queried each of these. There were countless issues and bugs that came from maintaining this code and keeping both systems aligned with correctly querying the data from each of these systems. For example, take a look at this Elasticsearch query verses the HiveQL equivalent.

Source https://stackoverflow.com/questions/67591269

QUESTION

Combine row aggregate data with individual rows

Asked 2021-May-18 at 11:22

I have a table looking like below

base_data

session_id event_type player_guess correct_answer 1 guess 'python' NULL 1 guess 'javascript' NULL 1 guess 'scala' NULL 1 all_answered NULL ['python','javascript','hadoop'] 2 guess 'triangle' NULL 2 guess 'square' NULL 2 all_answered NULL ['triangle','square']

I am trying to get a new column called as was_guess_correct defined as follow :

...

ANSWER

Answered 2021-May-18 at 11:22

You can use window functions to get the correct answers on each row. Then how you manage the result depends on the type of the column. If it is a string, you can just use like:

Source https://stackoverflow.com/questions/67581334

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install presto

We are working on getting presto into CRAN. For now, install Presto from github directly:.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: