kandi background
kandi background
Explore Kits
kandi background
Explore Kits
kandi background
Explore Kits
kandi background
Explore Kits
Explore all Big Data open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in Big Data

No Popular Releases are available at this moment for Big Data

Popular Libraries in Big Data

No Trending Libraries are available at this moment for Big Data

Trending New libraries in Big Data

No Trending Libraries are available at this moment for Big Data

Top Authors in Big Data

No Top Authors are available at this moment for Big Data.

Trending Kits in Big Data

disease-predictor

Disease Predictor

<img src="https://kandi.dev/owassets/disease-predictor-banner.png" alt="Disease Predictor banner" style="height:auto;max-width:100%;"/> Disease predictor is a way to recognize patient’s health by applying data mining and machine learning techniques on patient treatment history. Symptoms, Diagnosis for Personalized Healthcare Services for a Predictive Analytic Perspective. Pandas library is used in this kandi kit to predict the probability of disease. The kit has used pandas to load datasets and visualize the data, NumPy to implement our algorithm, and sklearn-pandas to build our model. In this project we will be using Pandas and Scikit-Learn to create a model that predicts whether or not a patient has a disease based on their demographics and lab results. We will also be using Jupyter Notebook to write code interactively so that we can see how our model performs when we change various parameters such as the number of features, amount of training data, etc. kandi kit provides you with a fully deployable Disease Predictor. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandikits/Disease_Detector/raw/main/kit_installer.zip'" type="button"> ⬇️ Download 1-Click Installer </button>

kandi

1-Click Install

flight-fare-prediction

Flight Fare Prediction

<img src="https://kandi.dev/owassets/flight-fare-predictor-model-banner.png" alt="Flight Fare Predictor" style="height:auto;max-width:100%;"/> Flight Fare Prediction is very useful for travel agencies as they can have an idea about the future fare trends and make their customers aware about them. This helps them to make decisions on whether to book flights for their clients or not. Flight Fare Prediction (having complex algorithms to calculate flight prices given various conditions present at that particular time) is a very interesting and useful project because it involves data analysis, machine learning and data science. We will use numpy for scientific computing with Python. It provides a rich array of tools such as linear algebra, Fourier transforms, statistical functions, and random number generation. Pandas to provide fast and flexible data structures for working with structured (tabular) data sets. joblib to provides tools to create shared memory jobs and implement lightweight pipelining in algorithmic code. kandi kit provides you with a fully deployable Flight Fare Prediction. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandikits/Flight-Fare-Prediction/raw/main/kit_installer.zip'" type="button"> ⬇️ Download 1-Click Installer </button>

kandi

1-Click Install

disease-predictor

Disease Predictor

<img src="https://kandi.dev/owassets/disease-predictor-banner.png" alt="Disease Predictor banner" style="height:auto;max-width:100%;"/> Disease predictor is a way to recognize patient’s health by applying data mining and machine learning techniques on patient treatment history. Symptoms, Diagnosis for Personalized Healthcare Services for a Predictive Analytic Perspective. Pandas library is used in this kandi kit to predict the probability of disease. The kit has used pandas to load datasets and visualize the data, NumPy to implement our algorithm, and sklearn-pandas to build our model. In this project we will be using Pandas and Scikit-Learn to create a model that predicts whether or not a patient has a disease based on their demographics and lab results. We will also be using Jupyter Notebook to write code interactively so that we can see how our model performs when we change various parameters such as the number of features, amount of training data, etc. kandi kit provides you with a fully deployable Disease Predictor. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandikits/Disease_Detector/raw/main/kit_installer.zip'" type="button"> ⬇️ Download 1-Click Installer </button>

kandi

1-Click Install

flight-fare-prediction

Flight Fare Prediction

<img src="https://kandi.dev/owassets/flight-fare-predictor-model-banner.png" alt="Flight Fare Predictor" style="height:auto;max-width:100%;"/> Flight Fare Prediction is very useful for travel agencies as they can have an idea about the future fare trends and make their customers aware about them. This helps them to make decisions on whether to book flights for their clients or not. Flight Fare Prediction (having complex algorithms to calculate flight prices given various conditions present at that particular time) is a very interesting and useful project because it involves data analysis, machine learning and data science. We will use numpy for scientific computing with Python. It provides a rich array of tools such as linear algebra, Fourier transforms, statistical functions, and random number generation. Pandas to provide fast and flexible data structures for working with structured (tabular) data sets. joblib to provides tools to create shared memory jobs and implement lightweight pipelining in algorithmic code. kandi kit provides you with a fully deployable Flight Fare Prediction. Source code included so that you can customize it for your requirement. <button class="MuiButtonBase-root MuiButton-root MuiButton-contained editexp MuiButton-containedSecondary click_collections_oneclickfiledownload " onclick="location.href='https://github.com/kandikits/Flight-Fare-Prediction/raw/main/kit_installer.zip'" type="button"> ⬇️ Download 1-Click Installer </button>

kandi

1-Click Install

Trending Discussions on Big Data

    Visualise missing values in a time series heatmap
    The minimum number of rows that sum to a given number in Python
    pandas fill NA but not all based on recent past record
    Delete and replace Nan values with mean of the rows in pandas dataframe
    How to decode column value from rare label by matching column names
    How do I copy a big database table to another in ABAP?
    Remove all rows between two sentinel strings in a column using pandas (but not the sentinel strings)
    Faster for loop with only if in python
    determine the range of a value using a look up table
    How to use multiprocessing in a chronical order?

QUESTION

Visualise missing values in a time series heatmap

Asked 2022-Mar-28 at 19:27

I am really new in big data analysing. Let's say I have a big data with the following features. I want to visualise the the percentage of missing values (None values) of fuel parameters for every id in specific hour. I want to draw a chart that x-axis is the time series (time column), y-axis is the 'id' and the colour will indicate its missing fuel percentage. I grouped the data base on 'id' and 'hour'

I don't know how to visualise missing value in a good way for all ids. For example if the percentage of missing value fuel of specific id in specific hour is 100% then the colour in that specific time and for that 'id' can be gray. If percentage of missing value in fuel is 50%, the colour can be light green. If percentage of missing value in fuel is 0% then the colour can be dark green. The colour must be based to the percentage of missing value in fuel, after grouping based on id and time.

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9

So for example, in the following code I computed the percentage of the missing value for every hour for specific id:

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10

Is there any solution?

ANSWER

Answered 2022-Mar-25 at 09:39

There is no right answer concerning missing values visualization, I guess it depends on your uses, habits ...

But first, to make it works, we need to preprocess your dataframe and make it analyzable, aka ensure its dtypes.

First let's build our data :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32

At this stage almost all data in our dataframe is string related, you need to convert fuel and time into a non-object dtypes.

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38

Time should be converted as datetime, id as int and fuel as float. Indeed, None should be convert as np.nan for numeric values, which needs the float dtype.

With a map, we can easily change all 'None' values into np.nan. I won't go deeper here, but for simplicity sake, I'll use a custom subclass of dict with a __missing__ implementation

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45

Then we have a clean dataframe :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61

Then, you can easily use bar, matrix or heatmap from the missingno module

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61msno.bar(df)
62msno.matrix(df, sparkline=False)
63msno.heatmap(df, cmap=&quot;RdYlGn&quot;)
64

A side note here, heatmap is kind of useless here, since it compares columns having missing values. And you only have one column with missing value. But for a bigger dataframe (~ 5/6 columns with missing values) it can be useful.

For a quick and dirty visualization, you can also print the number of missing value (aka np.nan, in pandas/numpy formulation) :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61msno.bar(df)
62msno.matrix(df, sparkline=False)
63msno.heatmap(df, cmap=&quot;RdYlGn&quot;)
64df.isna().sum()
65Out[72]: 
66id      0
67time    0
68fuel    2
69dtype: int64
70

Source https://stackoverflow.com/questions/71610279

Community Discussions contain sources that include Stack Exchange Network

    Visualise missing values in a time series heatmap
    The minimum number of rows that sum to a given number in Python
    pandas fill NA but not all based on recent past record
    Delete and replace Nan values with mean of the rows in pandas dataframe
    How to decode column value from rare label by matching column names
    How do I copy a big database table to another in ABAP?
    Remove all rows between two sentinel strings in a column using pandas (but not the sentinel strings)
    Faster for loop with only if in python
    determine the range of a value using a look up table
    How to use multiprocessing in a chronical order?

QUESTION

Visualise missing values in a time series heatmap

Asked 2022-Mar-28 at 19:27

I am really new in big data analysing. Let's say I have a big data with the following features. I want to visualise the the percentage of missing values (None values) of fuel parameters for every id in specific hour. I want to draw a chart that x-axis is the time series (time column), y-axis is the 'id' and the colour will indicate its missing fuel percentage. I grouped the data base on 'id' and 'hour'

I don't know how to visualise missing value in a good way for all ids. For example if the percentage of missing value fuel of specific id in specific hour is 100% then the colour in that specific time and for that 'id' can be gray. If percentage of missing value in fuel is 50%, the colour can be light green. If percentage of missing value in fuel is 0% then the colour can be dark green. The colour must be based to the percentage of missing value in fuel, after grouping based on id and time.

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9

So for example, in the following code I computed the percentage of the missing value for every hour for specific id:

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10

Is there any solution?

ANSWER

Answered 2022-Mar-25 at 09:39

There is no right answer concerning missing values visualization, I guess it depends on your uses, habits ...

But first, to make it works, we need to preprocess your dataframe and make it analyzable, aka ensure its dtypes.

First let's build our data :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32

At this stage almost all data in our dataframe is string related, you need to convert fuel and time into a non-object dtypes.

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38

Time should be converted as datetime, id as int and fuel as float. Indeed, None should be convert as np.nan for numeric values, which needs the float dtype.

With a map, we can easily change all 'None' values into np.nan. I won't go deeper here, but for simplicity sake, I'll use a custom subclass of dict with a __missing__ implementation

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45

Then we have a clean dataframe :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61

Then, you can easily use bar, matrix or heatmap from the missingno module

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61msno.bar(df)
62msno.matrix(df, sparkline=False)
63msno.heatmap(df, cmap=&quot;RdYlGn&quot;)
64

A side note here, heatmap is kind of useless here, since it compares columns having missing values. And you only have one column with missing value. But for a bigger dataframe (~ 5/6 columns with missing values) it can be useful.

For a quick and dirty visualization, you can also print the number of missing value (aka np.nan, in pandas/numpy formulation) :

copy icondownload icon

1    id    time                   fuel
20   1     2022-02-26 19:08:33    100
32   1     2022-02-26 20:09:35    None
43   2     2022-02-26 21:09:35    70
54   3     2022-02-26 21:10:55    60
65   4     2022-02-26 21:10:55    None
76   5     2022-02-26 22:12:43    50
87   6     2022-02-26 23:10:50    None
9df.set_index('ts').groupby(['id', pd.Grouper(freq='H')])['fuell'].apply(lambda x: x.isnull().mean() * 100)
10import pandas as pd
11from io import StringIO
12    
13csvfile = StringIO(
14&quot;&quot;&quot;id   time    fuel
151   2022-02-26 19:08:33 100
162   2022-02-26 19:09:35 70
173   2022-02-26 19:10:55 60
184   2022-02-26 20:10:55 None
195   2022-02-26 21:12:43 50
206   2022-02-26 22:10:50 None&quot;&quot;&quot;)
21df = pd.read_csv(csvfile, sep = '\t', engine='python')
22
23df
24Out[65]: 
25   id                 time  fuel
260   1  2022-02-26 19:08:33   100
271   2  2022-02-26 19:09:35    70
282   3  2022-02-26 19:10:55    60
293   4  2022-02-26 20:10:55  None
304   5  2022-02-26 21:12:43    50
315   6  2022-02-26 22:10:50  None
32df.dtypes
33Out[66]: 
34id       int64
35time    object
36fuel    object
37dtype: object
38df.time = pd.to_datetime(df.time, format = &quot;%Y/%m/%d %H:%M:%S&quot;)
39
40class dict_with_missing(dict):
41    def __missing__(self, key):
42        return key
43map_dict = dict_with_missing({'None' : np.nan})
44df.fuel = df.fuel.map(map_dict).astype(np.float32)
45df
46Out[68]: 
47   id                time   fuel
480   1 2022-02-26 19:08:33  100.0
491   2 2022-02-26 19:09:35   70.0
502   3 2022-02-26 19:10:55   60.0
513   4 2022-02-26 20:10:55    NaN
524   5 2022-02-26 21:12:43   50.0
535   6 2022-02-26 22:10:50    NaN
54
55df.dtypes
56Out[69]: 
57id               int64
58time    datetime64[ns]
59fuel           float32
60dtype: object
61msno.bar(df)
62msno.matrix(df, sparkline=False)
63msno.heatmap(df, cmap=&quot;RdYlGn&quot;)
64df.isna().sum()
65Out[72]: 
66id      0
67time    0
68fuel    2
69dtype: int64
70

Source https://stackoverflow.com/questions/71610279