How to Calculate Summary Statistics for a Pandas DataFrame

by ganesh Updated: Jan 24, 2023

Solution Kit

Summary statistics are statistical measures that summarize or describe a set of observations. In the context of a pandas DataFrame, summary statistics are statistical measures that summarize the data in the DataFrame.

A pandas DataFrame's describe() function allows you to generate a number of summary statistics for the data. This function provides a new DataFrame with a row for each numerical column and a column for each statistic, along with the statistical summary of the data.

describe(): A number of summary statistics, including as the count, mean, standard deviation, minimum, maximum, and quartiles of the data, are computed by default using the describe() method. You may select which columns to include in the summary using the include option and which columns to omit using the exclude parameter.

Other techniques, including mean(), median(), min(), max(), and std(), can be used to obtain certain summary statistics for the data.

Here is how to calculate summary statistics for a Pandas DataFrame;

Fig 1: Preview of the output that you will get on running this code from your Jupyter notebook

Code

In this solution, we use the describe function of the Pandas library

add statistics of dataframe in new columns

PythonLines of Code : 37License : Strong Copyleft (CC BY-SA 4.0)

df.describe()

>>> df
   col1  col2  col3
0     1    50     3
1     1    40     3
2     1    11     3
3     2    10     4
4     2    25     4
>>> df.describe()
           col1       col2      col3
count  5.000000   5.000000  5.000000
mean   1.400000  27.200000  3.400000
std    0.547723  17.655028  0.547723
min    1.000000  10.000000  3.000000
25%    1.000000  11.000000  3.000000
50%    1.000000  25.000000  3.000000
75%    2.000000  40.000000  4.000000
max    2.000000  50.000000  4.000000

df.std(axis=0)

df.std(axis=1)

df['F_mean'] = df.mean(axis=1)
df['F_std'] = df.std(axis=1)
df['F_min'] = df.min(axis=1)
df['F_max'] = df.max(axis=1)

>>> df.describe().loc[['count','mean', 'std', 'min', 'max']]
           col1       col2      col3
count  5.000000   5.000000  5.000000
mean   1.400000  27.200000  3.400000
std    0.547723  17.655028  0.547723
min    1.000000  10.000000  3.000000
max    2.000000  50.000000  4.000000

Copy the code using the "Copy" button above, and paste it in a cell of Jupyter notebook.
Run the cell to read online data and create a Pandas dataframe.

I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.

I found this code snippet by searching for "pandas summary statistics" in kandi. You can try any such use case!

Dependent Libraries

pandasby pandas-dev

Python

38689

Version:v2.0.2

License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Support

Quality

Security

License

Reuse

pandasby pandas-dev

Python 38689 Version:v2.0.2 License: Permissive (BSD-3-Clause)

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Support

Quality

Security

License

Reuse

If you do not have Pandas that is required to run this code, you can install it by clicking on the above link and following the installation instruction from either Github or Pypi links through the Pandas page in kandi.

You can search for any dependent library on kandi like Pandas.

Environment Tested

I tested this solution in the following versions. Be mindful of changes when working with other versions.

The solution is created in Python3.7.
The solution is tested on Pandas 1.3.1 version.

Using this solution, we are able to create summary statistics of a Dataframe using the Pandas library in Python with simple steps. This process also facilities an easy to use, hassle free method to create a hands-on working version of code which would help us read data in Pandas.

Support

For any support on kandi solution kits, please use the chat
For further learning resources, visit the Open Weaver Community learning page.

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

How to Calculate Summary Statistics for a Pandas DataFrame

Code

Dependent Libraries

Environment Tested

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow