Support
Quality
Security
License
Reuse
kandi has reviewed lap and discovered the below as its top functions. This is intended to give you an instant insight into lap implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Linear Assignment Problem solver (LAPJV/LAPMOD).
QUESTION
Construct non-overlapping datetime record (start, end datetime) dataframe
Asked 2022-Mar-22 at 13:56I need to create a dataframe removing over-lapping start
and end
datetimes for multiple ids
. I will use the start
and end
datetimes to aggregate values in a high frequency pandas dataframe, so I need to remove those overlapping datetimes in mst_df
.
import pandas as pd
#Proxy reference dataframe
master = [['site a', '2021-07-08 00:00:00', '2021-07-08 10:56:00'],
['site a', '2021-07-08 06:00:00', '2021-07-08 12:00:00'], #slightly overlapping
['site a', '2021-07-08 17:36:00', '2021-07-09 11:40:00'],
['site a', '2021-07-08 18:00:00', '2021-07-09 11:40:00'], #overlapping
['site a', '2021-07-09 00:00:00', '2021-07-09 05:40:00'], #overlapping
['site b', '2021-07-08 00:00:00', '2021-07-08 10:24:00'],
['site b', '2021-07-08 06:00:00', '2021-07-08 10:24:00'], #overlapping
['site b', '2021-07-08 17:32:00', '2021-07-09 11:12:00'],
['site b', '2021-07-08 18:00:00', '2021-07-09 11:12:00'], #overlapping
['site b', '2021-07-09 00:00:00', '2021-07-09 13:00:00']] #slightly overlapping
mst_df = pd.DataFrame(master, columns = ['id', 'start', 'end'])
mst_df['start'] = pd.to_datetime(mst_df['start'], infer_datetime_format=True)
mst_df['end'] = pd.to_datetime(mst_df['end'], infer_datetime_format=True)
Desired DataFrame:
id start end
site a 2021-07-08 00:00:00 2021-07-08 12:00:00
site a 2021-07-08 17:36:00 2021-07-09 11:40:00
site b 2021-07-08 00:00:00 2021-07-08 10:24:00
site b 2021-07-08 17:32:00 2021-07-09 13:00:00
ANSWER
Answered 2022-Mar-22 at 13:56I don't know if pandas
has special function for this. It has Interval.overlaping() to check if two ranges overlaps (and it works even with datetime
) but I don't see function to merge these two ranges so it still needs own code for merging. Fortunately it is easy.
Rows are sorted by start
so rows are not overlaping when previous_end < next_start
and I use it in for
-loop.
But first I group by site
to work with every site separatelly.
Next I get first row (as previous
) and run loop with other rows (as next)
and check previous_end < next_start
.
If it is True
then I can put previous
on list of results and get next
as previous
to work with rest of rows.
If it is False
then I create new range from both rows and use it to work with rest of rows.
Finally I add previous
to list.
After processing all groups I convert all to DataFrame.
import pandas as pd
#Proxy reference dataframe
master = [
['site a', '2021-07-08 00:00:00', '2021-07-08 10:56:00'],
['site a', '2021-07-08 06:00:00', '2021-07-08 12:00:00'], # slightly overlapping
['site a', '2021-07-08 17:36:00', '2021-07-09 11:40:00'],
['site a', '2021-07-08 18:00:00', '2021-07-09 11:40:00'], # overlapping
['site a', '2021-07-09 00:00:00', '2021-07-09 05:40:00'], # overlapping
['site b', '2021-07-08 00:00:00', '2021-07-08 10:24:00'],
['site b', '2021-07-08 06:00:00', '2021-07-08 10:24:00'], # overlapping
['site b', '2021-07-08 17:32:00', '2021-07-09 11:12:00'],
['site b', '2021-07-08 18:00:00', '2021-07-09 11:12:00'], # overlapping
['site b', '2021-07-09 00:00:00', '2021-07-09 13:00:00'] # slightly overlapping
]
mst_df = pd.DataFrame(master, columns = ['id', 'start', 'end'])
mst_df['start'] = pd.to_datetime(mst_df['start'], infer_datetime_format=True)
mst_df['end'] = pd.to_datetime(mst_df['end'], infer_datetime_format=True)
result = []
for val, group in mst_df.groupby('id'):
# get first
prev = group.iloc[0]
for idx, item in group[1:].iterrows():
if prev['end'] < item['start']:
# not overlapping - put previous to results and use next as previous
result.append(prev)
prev = item
else:
# overlappig - create on range start, end
prev['start'] = min(prev['start'], item['start'])
prev['end'] = max(prev['end'], item['end'])
# add when there is no next item
result.append(prev)
print(pd.DataFrame(result))
Result:
id start end
0 site a 2021-07-08 00:00:00 2021-07-08 12:00:00
2 site a 2021-07-08 17:36:00 2021-07-09 11:40:00
5 site b 2021-07-08 00:00:00 2021-07-08 10:24:00
7 site b 2021-07-08 17:32:00 2021-07-09 13:00:00
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
See Similar Libraries in
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source