kandi background
Explore Kits

PythonDataScienceHandbook | Python Data Science Handbook : full text | Machine Learning library

 by   jakevdp Jupyter Notebook Version: Current License: Non-SPDX

 by   jakevdp Jupyter Notebook Version: Current License: Non-SPDX

Download this library from

kandi X-RAY | PythonDataScienceHandbook Summary

PythonDataScienceHandbook is a Jupyter Notebook library typically used in Artificial Intelligence, Machine Learning, Numpy, Jupyter, Pandas applications. PythonDataScienceHandbook has no bugs, it has no vulnerabilities and it has medium support. However PythonDataScienceHandbook has a Non-SPDX License. You can download it from GitHub.
The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases. The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists. See Index.ipynb for an index of the notebooks available to accompany the text.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • PythonDataScienceHandbook has a medium active ecosystem.
  • It has 32215 star(s) with 14452 fork(s). There are 1737 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 83 open issues and 64 have been closed. On average issues are closed in 28 days. There are 89 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of PythonDataScienceHandbook is current.
This Library - Support
Best in #Machine Learning
Average in #Machine Learning
This Library - Support
Best in #Machine Learning
Average in #Machine Learning

quality kandi Quality

  • PythonDataScienceHandbook has 0 bugs and 0 code smells.
This Library - Quality
Best in #Machine Learning
Average in #Machine Learning
This Library - Quality
Best in #Machine Learning
Average in #Machine Learning

securitySecurity

  • PythonDataScienceHandbook has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • PythonDataScienceHandbook code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
This Library - Security
Best in #Machine Learning
Average in #Machine Learning
This Library - Security
Best in #Machine Learning
Average in #Machine Learning

license License

  • PythonDataScienceHandbook has a Non-SPDX License.
  • Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.
This Library - License
Best in #Machine Learning
Average in #Machine Learning
This Library - License
Best in #Machine Learning
Average in #Machine Learning

buildReuse

  • PythonDataScienceHandbook releases are not available. You will need to build from source code and install.
  • Installation instructions are not available. Examples and code snippets are available.
  • It has 1401 lines of code, 27 functions and 24 files.
  • It has high code complexity. Code complexity directly impacts maintainability of the code.
This Library - Reuse
Best in #Machine Learning
Average in #Machine Learning
This Library - Reuse
Best in #Machine Learning
Average in #Machine Learning
Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here

Get all kandi verified functions for this library.

Get all kandi verified functions for this library.

PythonDataScienceHandbook Key Features

Python Data Science Handbook: full text in Jupyter Notebooks

Software

copy iconCopydownload iconDownload
$ conda install --file requirements.txt

Grouping by Multi-Indices of both Row and Column

copy iconCopydownload iconDownload
health_data.stack().mean(level = 'year')

subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10
health_data.stack().groupby('year').describe()
subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

health_data.stack().mean(level = 'year')

subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10
health_data.stack().groupby('year').describe()
subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

health_data.stack().mean(level = 'year')

subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10
health_data.stack().groupby('year').describe()
subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

health_data.stack().mean(level = 'year')

subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10
health_data.stack().groupby('year').describe()
subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

heatmap simple customized binary legend

copy iconCopydownload iconDownload
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import seaborn as sns
import pandas as pd

df = pd.DataFrame({'A': {1: False, 2: False, 3: False, 4: True, 5: True, 6: True, 7: False, 8: False},
                   'B': {1: False, 2: False, 3: True, 4: True, 5: False, 6: True, 7: True, 8: False},
                   'C': {1: False, 2: True, 3: False, 4: False, 5: False, 6: False, 7: True, 8: True}})

fig, ax = plt.subplots(figsize=(3, 3))
cmap = sns.mpl_palette("Set2", 2)
sns.heatmap(data=df, cmap=cmap, cbar=False)
plt.xticks(rotation=90, fontsize=10)
plt.yticks(rotation=0, fontsize=10)

legend_handles = [Patch(color=cmap[True], label='Missing Value'),  # red
                  Patch(color=cmap[False], label='Non Missing Value')]  # green
plt.legend(handles=legend_handles, ncol=2, bbox_to_anchor=[0.5, 1.02], loc='lower center', fontsize=8, handlelength=.8)
plt.tight_layout()
plt.show()

how to create a tidy x-axis of datetime indexes of a data for my plot

copy iconCopydownload iconDownload
import matplotlib.dates as mdates
import matplotlib.pyplot as plt 

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(date, price , label="Price")
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
self.canvas.axes.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
import matplotlib.dates as mdates
import matplotlib.pyplot as plt 

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(date, price , label="Price")
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
self.canvas.axes.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))

How Pandas is doing groupby for below scenario

copy iconCopydownload iconDownload
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': rng.randint(0, 10, 6)},
                   columns = ['key', 'data1', 'data2'])
L = [0, 1, 0, 1, 2, 0]
print (df)
  key  data1  data2
0   A      0      5 <-0
1   B      1      0 <-1
2   C      2      3 <-0
3   A      3      3 <-1
4   B      4      7 <-2
5   C      5      9 <-0
data1 for 0 is 0 + 2 + 5 = 7
data2 for 0 is 5 + 3 + 9 = 17

data1 for 1 is 1 + 3 = 4
data2 for 1 is 0 + 3 = 3

data1 for 2 is 4
data2 for 2 is 7
print(df.groupby(L).sum())
   data1  data2
0      7     17
1      4      3
2      4      7
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': rng.randint(0, 10, 6)},
                   columns = ['key', 'data1', 'data2'])
L = [0, 1, 0, 1, 2, 0]
print (df)
  key  data1  data2
0   A      0      5 <-0
1   B      1      0 <-1
2   C      2      3 <-0
3   A      3      3 <-1
4   B      4      7 <-2
5   C      5      9 <-0
data1 for 0 is 0 + 2 + 5 = 7
data2 for 0 is 5 + 3 + 9 = 17

data1 for 1 is 1 + 3 = 4
data2 for 1 is 0 + 3 = 3

data1 for 2 is 4
data2 for 2 is 7
print(df.groupby(L).sum())
   data1  data2
0      7     17
1      4      3
2      4      7
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': rng.randint(0, 10, 6)},
                   columns = ['key', 'data1', 'data2'])
L = [0, 1, 0, 1, 2, 0]
print (df)
  key  data1  data2
0   A      0      5 <-0
1   B      1      0 <-1
2   C      2      3 <-0
3   A      3      3 <-1
4   B      4      7 <-2
5   C      5      9 <-0
data1 for 0 is 0 + 2 + 5 = 7
data2 for 0 is 5 + 3 + 9 = 17

data1 for 1 is 1 + 3 = 4
data2 for 1 is 0 + 3 = 3

data1 for 2 is 4
data2 for 2 is 7
print(df.groupby(L).sum())
   data1  data2
0      7     17
1      4      3
2      4      7

I'm learning from Python Data Science Handbook and got different graph with same code. What's wrong?

copy iconCopydownload iconDownload
import matplotlib.pyplot as plt
from sklearn.manifold import Isomap
from sklearn.datasets import load_digits

digits = load_digits()


fig, axs = plt.subplots(5,5, figsize=(16,9), sharex=True, sharey=True)
for ax in axs.flat:
    iso = Isomap(n_components=2)
    iso.fit(digits.data)
    data_projected = iso.transform(digits.data)
    im = ax.scatter(data_projected[:, 0], data_projected[:, 1], c=digits.target,
                    s=4, 
            edgecolor='none', alpha=0.5,
            norm=plt.Normalize(-.5, 9.5),
            cmap=plt.cm.get_cmap('tab10', 10))

fig.colorbar(im, label='digit label', ax=axs, ticks=range(10))


plt.show()

Pandas time series-specific operations in Altair

copy iconCopydownload iconDownload
# Load the data
# !curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

import pandas as pd
data = pd.read_csv('FremontBridge.csv', parse_dates=['Date'])
data.columns = ['Date', 'Total', 'East', 'West']
df = data.iloc[:24 * 365]  # limit to first year of data

# Draw the chart
import altair as alt
alt.data_transformers.enable('data_server')  # handle larger datasets

alt.Chart(df).mark_line().transform_fold(
    ['Total', 'East', 'West'],
).encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N'
)
alt.Chart(df).transform_fold(
    ['Total', 'East', 'West']
).mark_line().encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N',
    facet=alt.Facet('day(Date):O', columns=4)
).properties(width=200, height=150)
# Load the data
# !curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

import pandas as pd
data = pd.read_csv('FremontBridge.csv', parse_dates=['Date'])
data.columns = ['Date', 'Total', 'East', 'West']
df = data.iloc[:24 * 365]  # limit to first year of data

# Draw the chart
import altair as alt
alt.data_transformers.enable('data_server')  # handle larger datasets

alt.Chart(df).mark_line().transform_fold(
    ['Total', 'East', 'West'],
).encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N'
)
alt.Chart(df).transform_fold(
    ['Total', 'East', 'West']
).mark_line().encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N',
    facet=alt.Facet('day(Date):O', columns=4)
).properties(width=200, height=150)

Can we plot image data in Altair?

copy iconCopydownload iconDownload
import altair as alt
import pandas as pd

source = pd.DataFrame.from_records([
      {"x": 0.5, "y": 0.5, "img": "https://vega.github.io/vega-datasets/data/ffox.png"},
      {"x": 1.5, "y": 1.5, "img": "https://vega.github.io/vega-datasets/data/gimp.png"},
      {"x": 2.5, "y": 2.5, "img": "https://vega.github.io/vega-datasets/data/7zip.png"}
])

alt.Chart(source).mark_image(
    width=50,
    height=50
).encode(
    x='x',
    y='y',
    url='img'
)
import altair as alt
import pandas as pd
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)

data = pd.DataFrame({
    'image': list(faces.images[:12])  # list of 2D arrays
})

alt.Chart(data).transform_window(
    index='count()'           # number each of the images
).transform_flatten(
    ['image']                 # extract rows from each image
).transform_window(
    row='count()',            # number the rows...
    groupby=['index']         # ...within each image
).transform_flatten(
    ['image']                 # extract the values from each row
).transform_window(
    column='count()',         # number the columns...
    groupby=['index', 'row']  # ...within each row & image
).mark_rect().encode(
    alt.X('column:O', axis=None),
    alt.Y('row:O', axis=None),
    alt.Color('image:Q',
        scale=alt.Scale(scheme=alt.SchemeParams('greys', extent=[1, 0])),
        legend=None
    ),
    alt.Facet('index:N', columns=4)
).properties(
    width=100,
    height=120
)
import altair as alt
import pandas as pd

source = pd.DataFrame.from_records([
      {"x": 0.5, "y": 0.5, "img": "https://vega.github.io/vega-datasets/data/ffox.png"},
      {"x": 1.5, "y": 1.5, "img": "https://vega.github.io/vega-datasets/data/gimp.png"},
      {"x": 2.5, "y": 2.5, "img": "https://vega.github.io/vega-datasets/data/7zip.png"}
])

alt.Chart(source).mark_image(
    width=50,
    height=50
).encode(
    x='x',
    y='y',
    url='img'
)
import altair as alt
import pandas as pd
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)

data = pd.DataFrame({
    'image': list(faces.images[:12])  # list of 2D arrays
})

alt.Chart(data).transform_window(
    index='count()'           # number each of the images
).transform_flatten(
    ['image']                 # extract rows from each image
).transform_window(
    row='count()',            # number the rows...
    groupby=['index']         # ...within each image
).transform_flatten(
    ['image']                 # extract the values from each row
).transform_window(
    column='count()',         # number the columns...
    groupby=['index', 'row']  # ...within each row & image
).mark_rect().encode(
    alt.X('column:O', axis=None),
    alt.Y('row:O', axis=None),
    alt.Color('image:Q',
        scale=alt.Scale(scheme=alt.SchemeParams('greys', extent=[1, 0])),
        legend=None
    ),
    alt.Facet('index:N', columns=4)
).properties(
    width=100,
    height=120
)

Community Discussions

Trending Discussions on PythonDataScienceHandbook
  • Grouping by Multi-Indices of both Row and Column
  • heatmap simple customized binary legend
  • how to create a tidy x-axis of datetime indexes of a data for my plot
  • Is it possible to make a contour plot using ALTAIR 4.1 in python?
  • How Pandas is doing groupby for below scenario
  • I'm learning from Python Data Science Handbook and got different graph with same code. What's wrong?
  • Pandas time series-specific operations in Altair
  • Run a Jupyter notebook directly online (without downloading it locally)
  • Can we plot image data in Altair?
  • Interpretation of method plt.fill_between()? Discussion
Trending Discussions on PythonDataScienceHandbook

QUESTION

Grouping by Multi-Indices of both Row and Column

Asked 2022-Jan-30 at 14:34

I have created a table using Pandas following material from here.

The table created makes use of Multi-Indices for both columns and rows. enter image description here

I am trying to compute the descriptive statistics for each year and subject, meaning, displaying for instance the mean of 2013 for Bob, the mean for 2013 for Guido, and the mean for 2013 for Sue, for all subjects, and for all years. The means for Bob would consider the means for HR and Temp. Note: The types are the same as a coincidence, as this is not the case for the table implemented. Other subjects not included in the screenshot have varying types.

The closest I have managed to come to the solution is through the following code df.groupby(level = 0, axis = 0).describe() This grouped the data by the year, however, did not group by subject also.

ANSWER

Answered 2022-Jan-30 at 14:34

Providing links to external websites is also discouraged as they may change/disappear at any time without SO control

Having said that, the link provides most of the tools you need to answer your questions. More specifically, a combination of stack and mean should give you what you specifically asked about:

health_data.stack().mean(level = 'year')

produces


subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10

or more generally

health_data.stack().groupby('year').describe()

produces a long dataframe with the statistics grouped by year, for each subject:

subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

Source https://stackoverflow.com/questions/70901238

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install PythonDataScienceHandbook

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases
Explore Kits

Save this library and start creating your kit

Share this Page

share link
Consider Popular Machine Learning Libraries
Try Top Libraries by jakevdp
Compare Machine Learning Libraries with Highest Support
Compare Machine Learning Libraries with Highest Quality
Compare Machine Learning Libraries with Highest Security
Compare Machine Learning Libraries with Permissive License
Compare Machine Learning Libraries with Highest Reuse
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases
Explore Kits

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.