kandi background
kandi background
Explore Kits
kandi background
Explore Kits
Explore all Data Science open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in Data Science

Pandas 1.4.1

OpenRefine v3.5.2

v0.28.0: Coy Copernicus

Version 0.9.0

v0.9.1

pandas

Pandas 1.4.1

OpenRefine

OpenRefine v3.5.2

nteract

v0.28.0: Coy Copernicus

imbalanced-learn

Version 0.9.0

knowledge-repo

v0.9.1

Popular Libraries in Data Science

Trending New libraries in Data Science

Top Authors in Data Science

1

25 Libraries

1048

2

9 Libraries

477

3

4 Libraries

41

4

4 Libraries

320

5

4 Libraries

139

6

4 Libraries

48

7

4 Libraries

464

8

4 Libraries

215

9

4 Libraries

480

10

3 Libraries

7

1

25 Libraries

1048

2

9 Libraries

477

3

4 Libraries

41

4

4 Libraries

320

5

4 Libraries

139

6

4 Libraries

48

7

4 Libraries

464

8

4 Libraries

215

9

4 Libraries

480

10

3 Libraries

7

Trending Kits in Data Science

simple-data-analysis

Movie Recommendation System with Pandas

<img src="https://kandi.dev/owassets/movie-recommendation-system-banner.png" alt="Movie Recommendation System" style="height:auto;max-width:100%;"/> Have you ever questioned how Netflix makes recommendations for movies based on the ones you've already seen? Or how can choices like "Frequently Bought Together" appear on an e-commerce website? Although they may appear to be straightforward choices, a sophisticated statistical method is used to forecast these suggestions. Recommendation engines, recommendation systems, and recommender systems are all terms used to describe these systems. A recommender system is one of the most well-known uses of data science and machine learning. Based on the similarity between the items or the similarity between the users who previously evaluated those entities, a recommender system uses a statistical algorithm to forecast users' ratings for a specific entity. The assumption is that users of like categories will rate a group of items similarly. kandi kit provides you with a fully deployable Movie Recommendation System with Pandas. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandi1clickkits/MovieRecommender/raw/main/kit_installer.zip' " type="button"> โฌ‡๏ธ Download 1-Click Installer </button>

kandi

1-Click Install

simple-data-analysis

Movie Recommendation System with Pandas

<img src="https://kandi.dev/owassets/movie-recommendation-system-banner.png" alt="Movie Recommendation System" style="height:auto;max-width:100%;"/> Have you ever questioned how Netflix makes recommendations for movies based on the ones you've already seen? Or how can choices like "Frequently Bought Together" appear on an e-commerce website? Although they may appear to be straightforward choices, a sophisticated statistical method is used to forecast these suggestions. Recommendation engines, recommendation systems, and recommender systems are all terms used to describe these systems. A recommender system is one of the most well-known uses of data science and machine learning. Based on the similarity between the items or the similarity between the users who previously evaluated those entities, a recommender system uses a statistical algorithm to forecast users' ratings for a specific entity. The assumption is that users of like categories will rate a group of items similarly. kandi kit provides you with a fully deployable Movie Recommendation System with Pandas. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandi1clickkits/MovieRecommender/raw/main/kit_installer.zip' " type="button"> โฌ‡๏ธ Download 1-Click Installer </button>

kandi

1-Click Install

Trending Discussions on Data Science

    Pandas merge multiple dataframes on one temporal index, with latest value from all others
    C# Using class specific member references that child &quot;base&quot; calls respect
    Python script to repeatedly read a sensor errors
    renv + venv + jupyterlab + IRkernel: will it blend?
    What does runif() mean when used inside if_else()?
    Create new boolean fields based on specific bigrams appearing in a tokenized pandas dataframe
    Webpage starts zoomed out on mobile devices
    Do random functions such as sample work when I deploy a Shiny App?
    How can I check a confusion_matrix after fine-tuning with custom datasets?
    How to rewrite this deprecated expression using do and &quot;by&quot;, with &quot;groupby&quot; (Julia)

QUESTION

Pandas merge multiple dataframes on one temporal index, with latest value from all others

Asked 2022-Apr-16 at 03:35

I'm merging some dataframes which have a time index.

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24

then I use this merge procedure:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44

like this:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45

which produces this:

target production

And it all works great. Problem is efficiency - notice in the merge() procedure I use reduce and an outer merge to join the dataframes together, this can make a HUGE interim dataframe which then gets filtered down. But what if my pc doesn't have enough ram to handle that huge dataframe in memory? well that's the problem I'm trying to avoid.

I'm wondering if there's a way to avoid expanding the data out into a huge dataframe while merging.

Of course a regular old merge isn't sufficient because it only merges on exactly matching indexes rather than the latest temporal index before the target variable's observation:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45df1.merge(df2, how='left', left_index=True, right_index=True)
46

efficient but bad merge

Has this kind of thing been solved efficiently? Seems like a common data science issue, since no one wants to leak future information into their models, and everyone has various inputs to merge together...

ANSWER

Answered 2022-Apr-16 at 00:45

You're in luck: pandas.merge_asof does exactly what you need!

We use the default direction='backward' argument:

A โ€œbackwardโ€ search selects the last row in the right DataFrame whose โ€˜onโ€™ key is less than or equal to the leftโ€™s key.

Using your three example DataFrames:

copy icondownload icon

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45df1.merge(df2, how='left', left_index=True, right_index=True)
46import pandas as pd
47from functools import reduce
48
49# Convert all indexes to datetime
50for df in [df1, df2, df3]:
51    df.index = pd.to_datetime(df.index)
52
53# Perform as-of merges
54res = reduce(lambda left, right:
55             pd.merge_asof(left, right, left_index=True, right_index=True),
56             [df1, df2, df3])
57
58print(res)
59
60                    target feature2 feature3
61                       key     keys     keys
622022-04-15 20:20:20      a      NaN       c3
632022-04-15 20:20:21      b       d2       d3
642022-04-15 20:20:22      c       e2       d3
65

Source https://stackoverflow.com/questions/71889742

Community Discussions contain sources that include Stack Exchange Network

    Pandas merge multiple dataframes on one temporal index, with latest value from all others
    C# Using class specific member references that child &quot;base&quot; calls respect
    Python script to repeatedly read a sensor errors
    renv + venv + jupyterlab + IRkernel: will it blend?
    What does runif() mean when used inside if_else()?
    Create new boolean fields based on specific bigrams appearing in a tokenized pandas dataframe
    Webpage starts zoomed out on mobile devices
    Do random functions such as sample work when I deploy a Shiny App?
    How can I check a confusion_matrix after fine-tuning with custom datasets?
    How to rewrite this deprecated expression using do and &quot;by&quot;, with &quot;groupby&quot; (Julia)

QUESTION

Pandas merge multiple dataframes on one temporal index, with latest value from all others

Asked 2022-Apr-16 at 03:35

I'm merging some dataframes which have a time index.

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24

then I use this merge procedure:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44

like this:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45

which produces this:

target production

And it all works great. Problem is efficiency - notice in the merge() procedure I use reduce and an outer merge to join the dataframes together, this can make a HUGE interim dataframe which then gets filtered down. But what if my pc doesn't have enough ram to handle that huge dataframe in memory? well that's the problem I'm trying to avoid.

I'm wondering if there's a way to avoid expanding the data out into a huge dataframe while merging.

Of course a regular old merge isn't sufficient because it only merges on exactly matching indexes rather than the latest temporal index before the target variable's observation:

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45df1.merge(df2, how='left', left_index=True, right_index=True)
46

efficient but bad merge

Has this kind of thing been solved efficiently? Seems like a common data science issue, since no one wants to leak future information into their models, and everyone has various inputs to merge together...

ANSWER

Answered 2022-Apr-16 at 00:45

You're in luck: pandas.merge_asof does exactly what you need!

We use the default direction='backward' argument:

A โ€œbackwardโ€ search selects the last row in the right DataFrame whose โ€˜onโ€™ key is less than or equal to the leftโ€™s key.

Using your three example DataFrames:

copy icondownload icon

1import pandas as pd
2df1 = pd.DataFrame(['a', 'b', 'c'],
3    columns=pd.MultiIndex.from_product([['target'], ['key']]),
4    index = [
5        '2022-04-15 20:20:20.000000', 
6        '2022-04-15 20:20:21.000000', 
7        '2022-04-15 20:20:22.000000'],)
8df2 = pd.DataFrame(['a2', 'b2', 'c2', 'd2', 'e2'],
9    columns=pd.MultiIndex.from_product([['feature2'], ['keys']]),
10    index = [
11        '2022-04-15 20:20:20.100000', 
12        '2022-04-15 20:20:20.500000', 
13        '2022-04-15 20:20:20.900000', 
14        '2022-04-15 20:20:21.000000', 
15        '2022-04-15 20:20:21.100000',],)
16df3 = pd.DataFrame(['a3', 'b3', 'c3', 'd3', 'e3'],
17    columns=pd.MultiIndex.from_product([['feature3'], ['keys']]),
18    index = [
19        '2022-04-15 20:20:19.000000', 
20        '2022-04-15 20:20:19.200000', 
21        '2022-04-15 20:20:20.000000', 
22        '2022-04-15 20:20:20.200000', 
23        '2022-04-15 20:20:23.100000',],)
24def merge(dfs:list[pd.DataFrame], targetColumn:'str|tuple[str]'):
25    from functools import reduce
26    if len(dfs) == 0:
27        return None
28    if len(dfs) == 1:
29        return dfs[0]
30    for df in dfs:
31        df.index = pd.to_datetime(df.index)
32    merged = reduce(
33        lambda left, right: pd.merge(
34            left, 
35            right, 
36            how='outer',
37            left_index=True,
38            right_index=True),
39        dfs)
40    for col in merged.columns:
41        if col != targetColumn:
42            merged[col] = merged[col].fillna(method='ffill')
43    return merged[merged[targetColumn].notna()]
44merged = merge([df1, df2, df3], targetColumn=('target', 'key'))
45df1.merge(df2, how='left', left_index=True, right_index=True)
46import pandas as pd
47from functools import reduce
48
49# Convert all indexes to datetime
50for df in [df1, df2, df3]:
51    df.index = pd.to_datetime(df.index)
52
53# Perform as-of merges
54res = reduce(lambda left, right:
55             pd.merge_asof(left, right, left_index=True, right_index=True),
56             [df1, df2, df3])
57
58print(res)
59
60                    target feature2 feature3
61                       key     keys     keys
622022-04-15 20:20:20      a      NaN       c3
632022-04-15 20:20:21      b       d2       d3
642022-04-15 20:20:22      c       e2       d3
65

Source https://stackoverflow.com/questions/71889742