Support
Quality
Security
License
Reuse
kandi has reviewed trino and discovered the below as its top functions. This is intended to give you an instant insight into trino implemented functionality, and help decide if they suit your requirements.
Open the File menu and select Project Structure
In the SDKs section, ensure that JDK 11 is selected (create one if none exist)
In the Project section, ensure the Project language level is set to 11
Building Trino
./mvnw clean install -DskipTests
Running the CLI
client/trino-cli/target/trino-cli-*-executable.jar
Don't want to double count in Filtered Aggregation
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
-----------------------
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p30,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p60,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day))
as total_active_p90
RPM installation Trino throws python dependency
$ sudo rpm -i --nodeps trino-server-rpm-368.rpm
Presto SQL - Transforming array(BingTile) into geometry
-- sample data
WITH dataset (City_ID , Store_ID , latitude , longitude , radius ) AS (
VALUES (12345 , 'store_01' , 36.1234 , 31.1234 , 3.11),
(12345 , 'store_02' , 36.5678 , 31.5678 , 2.52)
)
--query
select city_id,
store_id,
geometry_union(
transform(bingTiles_around, t->bing_tile_polygon(t))
)
from(
select city_id,
store_id,
latitude,
longitude,
radius,
bing_tiles_around(latitude, longitude, 10, radius) as bingTiles_around
from dataset
)
Does Trino (formerly Presto) INSERT work with CTEs?
INSERT INTO my_destination_table
with my_CTE as
(SELECT a,b,c
FROM my_source_table
WHERE <some conditions to apply>)
SELECT a, b, c
FROM my_CTE;
Trino implement a function like regexp_split_to_table()?
select s.str as original_str, u.str as exploded_value
from
(select 'one,two,,,three' as str)s
cross join unnest(regexp_split(s.str,',+')) as u(str)
original_str exploded_value
one,two,,,three one
one,two,,,three two
one,two,,,three three
-----------------------
select s.str as original_str, u.str as exploded_value
from
(select 'one,two,,,three' as str)s
cross join unnest(regexp_split(s.str,',+')) as u(str)
original_str exploded_value
one,two,,,three one
one,two,,,three two
one,two,,,three three
window function for moving average
df['date'] = pd.to_datetime(df['date'])
df['my_average'] = (df.groupby('customer_id')
.apply(lambda d: d.rolling('30D', on='date')['price'].mean())
.reset_index(level=0, drop=True)
.astype(int)
)
customer_id date price my_average
0 cust_1 2020-10-10 100 100
2 cust_1 2020-10-15 200 150
3 cust_1 2020-10-16 240 180
5 cust_1 2020-12-25 140 140
1 cust_2 2020-10-10 15 15
4 cust_2 2020-12-20 25 25
6 cust_2 2021-01-01 5 15
-----------------------
df['date'] = pd.to_datetime(df['date'])
df['my_average'] = (df.groupby('customer_id')
.apply(lambda d: d.rolling('30D', on='date')['price'].mean())
.reset_index(level=0, drop=True)
.astype(int)
)
customer_id date price my_average
0 cust_1 2020-10-10 100 100
2 cust_1 2020-10-15 200 150
3 cust_1 2020-10-16 240 180
5 cust_1 2020-12-25 140 140
1 cust_2 2020-10-10 15 15
4 cust_2 2020-12-20 25 25
6 cust_2 2021-01-01 5 15
Combine Consecutive Rows for given index values in Pandas DataFrame
df['Serial No.'] = df['Serial No.'].bfill().ffill()
df['Total'] = df['Total'].astype(str).replace('nan', np.nan)
df_out = df.groupby('Serial No.', as_index=False).agg(lambda x: ''.join(x.dropna()))
df_out['Total'] = df_out['Total'].replace('', np.nan, regex=True).astype(float)
print(df_out)
Serial No. Name Type Total
0 1.0 Easter Multiple 19.0
1 2.0 Costeri Roundabout 16.0
2 3.0 Zhiop Tee 16.0
3 4.0 Nesss Cross 10.0
4 5.0 Uoar Lhahara Tee 10.0
5 6.0 Trino Nishra(KX) Tee 9.0
6 7.0 Old-FX Box Cross 8.0
7 8.0 Gardeners Roundabout 8.0
8 9.0 Max Detter Roundabout 7.0
9 10.0 Others (Asynco,D+ E,etc) Cross 7.0
-----------------------
df['Serial No.'] = df['Serial No.'].bfill().ffill()
df['Total'] = df['Total'].astype(str).replace('nan', np.nan)
df_out = df.groupby('Serial No.', as_index=False).agg(lambda x: ''.join(x.dropna()))
df_out['Total'] = df_out['Total'].replace('', np.nan, regex=True).astype(float)
print(df_out)
Serial No. Name Type Total
0 1.0 Easter Multiple 19.0
1 2.0 Costeri Roundabout 16.0
2 3.0 Zhiop Tee 16.0
3 4.0 Nesss Cross 10.0
4 5.0 Uoar Lhahara Tee 10.0
5 6.0 Trino Nishra(KX) Tee 9.0
6 7.0 Old-FX Box Cross 8.0
7 8.0 Gardeners Roundabout 8.0
8 9.0 Max Detter Roundabout 7.0
9 10.0 Others (Asynco,D+ E,etc) Cross 7.0
AWS Athena (Trino SQL) Convert birthdate string (mm/dd/yy) to date -- need twentieth century
select case when
parse_datetime(birthdate, 'MM/dd/yy') > current_timestamp then
parse_datetime(birthdate, 'MM/dd/yy') - interval '100' year
else parse_datetime(birthdate, 'MM/dd/yy')
end as birthdate
Export non-varchar data to CSV table using Trino (formerly PrestoDB)
CREATE TABLE region_csv
WITH (format='CSV')
AS SELECT CAST(regionkey AS varchar), CAST(name AS varchar), CAST(comment AS varchar)
FROM region_orc
-----------------------
CREATE TABLE hive.test.region (
regionkey bigint,
name varchar(25),
comment varchar(152)
)
WITH (
format = 'TEXTFILE',
textfile_field_separator = ','
);
INSERT INTO hive.test.region VALUES (
1,
'A "quote", with comma',
'The comment contains a newline
in it');
1,"A ""quote"", with comma","The comment contains a newline
in it"
1,A "quote", with comma,The comment contains a newline
in it
-----------------------
CREATE TABLE hive.test.region (
regionkey bigint,
name varchar(25),
comment varchar(152)
)
WITH (
format = 'TEXTFILE',
textfile_field_separator = ','
);
INSERT INTO hive.test.region VALUES (
1,
'A "quote", with comma',
'The comment contains a newline
in it');
1,"A ""quote"", with comma","The comment contains a newline
in it"
1,A "quote", with comma,The comment contains a newline
in it
-----------------------
CREATE TABLE hive.test.region (
regionkey bigint,
name varchar(25),
comment varchar(152)
)
WITH (
format = 'TEXTFILE',
textfile_field_separator = ','
);
INSERT INTO hive.test.region VALUES (
1,
'A "quote", with comma',
'The comment contains a newline
in it');
1,"A ""quote"", with comma","The comment contains a newline
in it"
1,A "quote", with comma,The comment contains a newline
in it
-----------------------
CREATE TABLE hive.test.region (
regionkey bigint,
name varchar(25),
comment varchar(152)
)
WITH (
format = 'TEXTFILE',
textfile_field_separator = ','
);
INSERT INTO hive.test.region VALUES (
1,
'A "quote", with comma',
'The comment contains a newline
in it');
1,"A ""quote"", with comma","The comment contains a newline
in it"
1,A "quote", with comma,The comment contains a newline
in it
MongoTimeoutException: Error While Using MongoDB with Trino
import com.mongodb.MongoClient;
import com.mongodb.MongoClientOptions;
import com.mongodb.MongoCredential;
import com.mongodb.ServerAddress;
import java.util.Collections;
import org.bson.Document;
import java.util.List;
public class MongoSession
{
public static void main(String[] args)
{
ServerAddress seed = new ServerAddress("127.0.0.1:27017");
MongoCredential credential = MongoCredential.createCredential("user", "database", "password".toCharArray());
MongoClient client = new MongoClient(seed, Collections.singletonList(credential), MongoClientOptions.builder().build());
client.getDatabase("database").runCommand(new Document("ping", 1));
for (String name : client.getDatabase("database").listCollectionNames()) {
System.out.println(name);
}
}
}
QUESTION
Don't want to double count in Filtered Aggregation
Asked 2022-Mar-29 at 19:38Sample Data:
shopper_id | last_purchase_timestamp | active_p30 | active_p60 | active_over_p90 |
---|---|---|---|---|
1 | 2022-03-02 1:20:00 | TRUE | TRUE | TRUE |
2 | 2022-03-01 1:30:00 | TRUE | TRUE | TRUE |
3 | 2022-02-28 1:24:03 | TRUE | TRUE | TRUE |
4 | 2022-02-02 21:22:26 | FALSE | TRUE | TRUE |
I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.
My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.
What I've attempted:
AS total_active_p30,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day)
AS total_active_p60,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS
total_active_p90
Results:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 4 | 4 |
However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.
My ideal output would be:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 1 | 0 |
Thanks in advance everyone! I'm using Trino!
ANSWER
Answered 2022-Mar-29 at 19:02Add both upper and lower bounds to the filter so they do not intersect. Something along this lines:
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
Output:
total_active_p30 | total_active_p60 | total_active_p90 |
---|---|---|
3 | 1 | 0 |
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
Save this library and start creating your kit