By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
by pandas-dev Python Version: 1.5.2 License: BSD-3-Clause
by pandas-dev Python Version: 1.5.2 License: BSD-3-Clause
Support
Quality
Security
License
Reuse
kandi has reviewed pandas and discovered the below as its top functions. This is intended to give you an instant insight into pandas implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Easy handling of missing data (represented as NaN, NA, or NaT) in floating point as well as non-floating point data
Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes (possible to have multiple labels per tick)
Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format
Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging
See all related Code Snippets
QUESTION
Installing scipy and scikit-learn on apple m1
Asked 2022-Mar-22 at 06:21The installation on the m1 chip for the following packages: Numpy 1.21.1, pandas 1.3.0, torch 1.9.0 and a few other ones works fine for me. They also seem to work properly while testing them. However when I try to install scipy or scikit-learn via pip this error appears:
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy which use PEP 517 and cannot be installed directly
Why should Numpy be build again when I have the latest version from pip already installed?
Every previous installation was done using python3.9 -m pip install ...
on Mac OS 11.3.1 with the apple m1 chip.
Maybe somebody knows how to deal with this error or if its just a matter of time.
ANSWER
Answered 2021-Aug-02 at 14:33Please see this note of scikit-learn
about
Installing on Apple Silicon M1 hardware
The recently introduced
macos/arm64
platform (sometimes also known asmacos/aarch64
) requires the open source community to upgrade the build configuation and automation to properly support it.At the time of writing (January 2021), the only way to get a working installation of scikit-learn on this hardware is to install scikit-learn and its dependencies from the conda-forge distribution, for instance using the miniforge installers:
https://github.com/conda-forge/miniforge
The following issue tracks progress on making it possible to install scikit-learn from PyPI with pip:
QUESTION
Error while downloading the requirements using pip install (setup command: use_2to3 is invalid.)
Asked 2022-Mar-05 at 07:13version pip 21.2.4 python 3.6
The command:
pip install -r requirments.txt
The content of my requirements.txt
:
mongoengine==0.19.1
numpy==1.16.2
pylint
pandas==1.1.5
fawkes
The command is failing with this error
ERROR: Command errored out with exit status 1:
command: /Users/*/Desktop/ml/*/venv/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/setup.py'"'"'; __file__='"'"'/private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-pip-egg-info-97994d6e
cwd: /private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/
Complete output (1 lines):
error in mongoengine setup command: use_2to3 is invalid.
----------------------------------------
WARNING: Discarding https://*/pypi/packages/mongoengine-0.19.1.tar.gz#md5=68e613009f6466239158821a102ac084 (from https://*/pypi/simple/mongoengine/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement mongoengine==0.19.1 (from versions: 0.15.0, 0.19.1)
ERROR: No matching distribution found for mongoengine==0.19.1
ANSWER
Answered 2021-Nov-19 at 13:30It looks like setuptools>=58
breaks support for use_2to3
:
So you should update setuptools
to setuptools<58
or avoid using packages with use_2to3
in the setup parameters.
I was having the same problem, pip==19.3.1
QUESTION
Mapping complex JSON to Pandas Dataframe
Asked 2022-Feb-25 at 13:57Background
I have a complex nested JSON object, which I am trying to unpack into a pandas df
in a very specific way.
JSON Object
this is an extract, containing randomized data of the JSON object, which shows examples of the hierarchy (inc. children) for 1x family (i.e. 'Falconer Family'), however there is 100s of them in total and this extract just has 1x family, however the full JSON object has multiple -
{
"meta": {
"columns": [{
"key": "value",
"display_name": "Adjusted Value (No Div, USD)",
"output_type": "Number",
"currency": "USD"
},
{
"key": "time_weighted_return",
"display_name": "Current Quarter TWR (USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "time_weighted_return_2",
"display_name": "YTD TWR (USD)",
"output_type": "Percent",
"currency": "USD"
},
{
"key": "_custom_twr_audit_note_911328",
"display_name": "TWR Audit Note",
"output_type": "Word"
}
],
"groupings": [{
"key": "_custom_name_747205",
"display_name": "* Reporting Client Name"
},
{
"key": "_custom_new_entity_group_453577",
"display_name": "NEW Entity Group"
},
{
"key": "_custom_level_2_624287",
"display_name": "* Level 2"
},
{
"key": "legal_entity",
"display_name": "Legal Entity"
}
]
},
"data": {
"type": "portfolio_views",
"attributes": {
"total": {
"name": "Total",
"columns": {
"time_weighted_return": -0.046732301295604683,
"time_weighted_return_2": -0.046732301295604683,
"_custom_twr_audit_note_911328": null,
"value": 23132492.905107163
},
"children": [{
"name": "Falconer Family",
"grouping": "_custom_name_747205",
"columns": {
"time_weighted_return": -0.046732301295604683,
"time_weighted_return_2": -0.046732301295604683,
"_custom_twr_audit_note_911328": null,
"value": 23132492.905107163
},
"children": [{
"name": "Wealth Bucket A",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.045960317420568164,
"time_weighted_return_2": -0.045960317420568164,
"_custom_twr_audit_note_911328": null,
"value": 13264448.506587159
},
"children": [{
"name": "Asset Class A",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": 0.000003434094574039648,
"time_weighted_return_2": 0.000003434094574039648,
"_custom_twr_audit_note_911328": null,
"value": 3337.99
},
"children": [{
"entity_id": 10604454,
"name": "HUDJ Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.000003434094574039648,
"time_weighted_return_2": 0.000003434094574039648,
"_custom_twr_audit_note_911328": null,
"value": 3337.99
},
"children": []
}]
},
{
"name": "Asset Class B",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.025871339096964152,
"time_weighted_return_2": -0.025871339096964152,
"_custom_twr_audit_note_911328": null,
"value": 1017004.7192636987
},
"children": [{
"entity_id": 10604454,
"name": "HUDG Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.025871339096964152,
"time_weighted_return_2": -0.025871339096964152,
"_custom_twr_audit_note_911328": null,
"value": 1017004.7192636987
},
"children": []
}]
},
{
"name": "Asset Class C",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.030370376329670656,
"time_weighted_return_2": -0.030370376329670656,
"_custom_twr_audit_note_911328": null,
"value": 231142.67772000004
},
"children": [{
"entity_id": 10604454,
"name": "HKDJ Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.030370376329670656,
"time_weighted_return_2": -0.030370376329670656,
"_custom_twr_audit_note_911328": null,
"value": 231142.67772000004
},
"children": []
}]
},
{
"name": "Asset Class D",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.05382756475465478,
"time_weighted_return_2": -0.05382756475465478,
"_custom_twr_audit_note_911328": null,
"value": 9791282.570000006
},
"children": [{
"entity_id": 10604454,
"name": "HUDW Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.05382756475465478,
"time_weighted_return_2": -0.05382756475465478,
"_custom_twr_audit_note_911328": null,
"value": 9791282.570000006
},
"children": []
}]
},
{
"name": "Asset Class E",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.01351630404081805,
"time_weighted_return_2": -0.01351630404081805,
"_custom_twr_audit_note_911328": null,
"value": 2153366.6396034593
},
"children": [{
"entity_id": 10604454,
"name": "HJDJ Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.01351630404081805,
"time_weighted_return_2": -0.01351630404081805,
"_custom_twr_audit_note_911328": null,
"value": 2153366.6396034593
},
"children": []
}]
},
{
"name": "Asset Class F",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.002298190175237247,
"time_weighted_return_2": -0.002298190175237247,
"_custom_twr_audit_note_911328": null,
"value": 68313.90999999999
},
"children": [{
"entity_id": 10604454,
"name": "HADJ Trust",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.002298190175237247,
"time_weighted_return_2": -0.002298190175237247,
"_custom_twr_audit_note_911328": null,
"value": 68313.90999999999
},
"children": []
}]
}
]
},
{
"name": "Wealth Bucket B",
"grouping": "_custom_new_entity_group_453577",
"columns": {
"time_weighted_return": -0.04769870075659244,
"time_weighted_return_2": -0.04769870075659244,
"_custom_twr_audit_note_911328": null,
"value": 9868044.398519998
},
"children": [{
"name": "Asset Class A",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": 0.000028632718065191298,
"time_weighted_return_2": 0.000028632718065191298,
"_custom_twr_audit_note_911328": null,
"value": 10234.94
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.0000282679297198829,
"time_weighted_return_2": 0.0000282679297198829,
"_custom_twr_audit_note_911328": null,
"value": 244.28
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.000049373572795108345,
"time_weighted_return_2": 0.000049373572795108345,
"_custom_twr_audit_note_911328": null,
"value": 5081.08
},
"children": []
},
{
"entity_id": 10598341,
"name": "Cht 11th Tr HBO Shirley",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.000006609603754315074,
"time_weighted_return_2": 0.000006609603754315074,
"_custom_twr_audit_note_911328": null,
"value": 1523.62
},
"children": []
},
{
"entity_id": 10598337,
"name": "Cht 11th Tr HBO Hannah",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.000010999769004760296,
"time_weighted_return_2": 0.000010999769004760296,
"_custom_twr_audit_note_911328": null,
"value": 1828.9
},
"children": []
},
{
"entity_id": 10598334,
"name": "Cht 11th Tr HBO Lau",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0.000006466673995619843,
"time_weighted_return_2": 0.000006466673995619843,
"_custom_twr_audit_note_911328": null,
"value": 1557.06
},
"children": []
}
]
},
{
"name": "Asset Class B",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.024645947842438676,
"time_weighted_return_2": -0.024645947842438676,
"_custom_twr_audit_note_911328": null,
"value": 674052.31962
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.043304004172576405,
"time_weighted_return_2": -0.043304004172576405,
"_custom_twr_audit_note_911328": null,
"value": 52800.96
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.022408434778798836,
"time_weighted_return_2": -0.022408434778798836,
"_custom_twr_audit_note_911328": null,
"value": 599594.11962
},
"children": []
},
{
"entity_id": 10598341,
"name": "Cht 11th Tr HBO Shirley",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.039799855483646174,
"time_weighted_return_2": -0.039799855483646174,
"_custom_twr_audit_note_911328": null,
"value": 7219.08
},
"children": []
},
{
"entity_id": 10598337,
"name": "Cht 11th Tr HBO Hannah",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.039799855483646174,
"time_weighted_return_2": -0.039799855483646174,
"_custom_twr_audit_note_911328": null,
"value": 7219.08
},
"children": []
},
{
"entity_id": 10598334,
"name": "Cht 11th Tr HBO Lau",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.039799855483646174,
"time_weighted_return_2": -0.039799855483646174,
"_custom_twr_audit_note_911328": null,
"value": 7219.08
},
"children": []
}
]
},
{
"name": "Asset Class C",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.03037038746301135,
"time_weighted_return_2": -0.03037038746301135,
"_custom_twr_audit_note_911328": null,
"value": 114472.69744
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.030370390035505124,
"time_weighted_return_2": -0.030370390035505124,
"_custom_twr_audit_note_911328": null,
"value": 114472.68744000001
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": 0,
"time_weighted_return_2": 0,
"_custom_twr_audit_note_911328": null,
"value": 0.01
},
"children": []
}
]
},
{
"name": "Asset Class D",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.06604362523792162,
"time_weighted_return_2": -0.06604362523792162,
"_custom_twr_audit_note_911328": null,
"value": 5722529.229999997
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.06154960593668424,
"time_weighted_return_2": -0.06154960593668424,
"_custom_twr_audit_note_911328": null,
"value": 1191838.9399999995
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.06750460387418267,
"time_weighted_return_2": -0.06750460387418267,
"_custom_twr_audit_note_911328": null,
"value": 4416618.520000002
},
"children": []
},
{
"entity_id": 10598341,
"name": "Cht 11th Tr HBO Shirley",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.05604507809250081,
"time_weighted_return_2": -0.05604507809250081,
"_custom_twr_audit_note_911328": null,
"value": 38190.33
},
"children": []
},
{
"entity_id": 10598337,
"name": "Cht 11th Tr HBO Hannah",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.05604507809250081,
"time_weighted_return_2": -0.05604507809250081,
"_custom_twr_audit_note_911328": null,
"value": 37940.72
},
"children": []
},
{
"entity_id": 10598334,
"name": "Cht 11th Tr HBO Lau",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.05604507809250081,
"time_weighted_return_2": -0.05604507809250081,
"_custom_twr_audit_note_911328": null,
"value": 37940.72
},
"children": []
}
]
},
{
"name": "Asset Class E",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.017118805423322003,
"time_weighted_return_2": -0.017118805423322003,
"_custom_twr_audit_note_911328": null,
"value": 3148495.0914600003
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.015251157805867277,
"time_weighted_return_2": -0.015251157805867277,
"_custom_twr_audit_note_911328": null,
"value": 800493.06146
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.01739609576880241,
"time_weighted_return_2": -0.01739609576880241,
"_custom_twr_audit_note_911328": null,
"value": 2215511.2700000005
},
"children": []
},
{
"entity_id": 10598341,
"name": "Cht 11th Tr HBO Shirley",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.02085132265594647,
"time_weighted_return_2": -0.02085132265594647,
"_custom_twr_audit_note_911328": null,
"value": 44031.21
},
"children": []
},
{
"entity_id": 10598337,
"name": "Cht 11th Tr HBO Hannah",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.02089393244695803,
"time_weighted_return_2": -0.02089393244695803,
"_custom_twr_audit_note_911328": null,
"value": 44394.159999999996
},
"children": []
},
{
"entity_id": 10598334,
"name": "Cht 11th Tr HBO Lau",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.020607507059866248,
"time_weighted_return_2": -0.020607507059866248,
"_custom_twr_audit_note_911328": null,
"value": 44065.39000000001
},
"children": []
}
]
},
{
"name": "Asset Class F",
"grouping": "_custom_level_2_624287",
"columns": {
"time_weighted_return": -0.0014710489231547497,
"time_weighted_return_2": -0.0014710489231547497,
"_custom_twr_audit_note_911328": null,
"value": 198260.12
},
"children": [{
"entity_id": 10868778,
"name": "2012 Desc Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.0014477244560456848,
"time_weighted_return_2": -0.0014477244560456848,
"_custom_twr_audit_note_911328": null,
"value": 44612.33
},
"children": []
},
{
"entity_id": 10643052,
"name": "2013 Irrev Tr HBO Thalia",
"grouping": "legal_entity",
"columns": {
"time_weighted_return": -0.001477821083437858,
"time_weighted_return_2": -0.001477821083437858,
"_custom_twr_audit_note_911328": null,
"value": 153647.78999999998
},
"children": []
}
]
}
]
}
]
}]
}
},
"included": []
}
}
Notes on JSON Object extract
data
- data in here can be ignored, these are aggregated values for underlying children.meta
- columns
– contains the column header values I want to use for each applicable children
‘column` key:pair values.groupings
- can be ignored.children
hierarchy – there are 4x levels of children
which can be identified by their name
as follows –
name
(i.e., ‘Falconer Family’)name
(e.g., ‘Wealth Bucket A’)name
(e.g., ‘Asset Class A’)name
(e.g., ‘HUDJ Trust’)Target Output
this is an extract of target df
structure I am trying to achieve -
portfolio | name | entity_id | Adjusted Value (No Div, USD) | Current Quarter TWR (USD) | YTD TWR (USD) | TWR Audit Note |
---|---|---|---|---|---|---|
Falconer Family | Falconer Family | 23132492.90510712 | -0.046732301295604683 | -0.046732301295604683 | None | |
Falconer Family | Wealth Bucket A | 13264448.506587146 | -0.045960317420568164 | -0.045960317420568164 | None | |
Falconer Family | Asset Class A | 3337.99 | 0.000003434094574039648 | 0.000003434094574039648 | None | |
Falconer Family | HUDJ Trust | 10604454 | 3337.99 | 0.000003434094574039648 | 0.000003434094574039648 | None |
Falconer Family | Asset Class B | 1017004.7192636987 | -0.025871339096964152 | -0.025871339096964152 | None | |
Falconer Family | HUDG Trust | 10604454 | 1017004.7192636987 | -0.025871339096964152 | -0.025871339096964152 | None |
Falconer Family | Asset Class C | 231142.67772000004 | -0.030370376329670656 | -0.030370376329670656 | None | |
Falconer Family | HKDJ Trust | 10604454 | 231142.67772000004 | -0.030370376329670656 | -0.030370376329670656 | None |
Falconer Family | Asset Class D | 9791282.570000006 | -0.05382756475465478 | -0.05382756475465478 | None | |
Falconer Family | HUDW Trust | 10604454 | 9791282.570000006 | -0.05382756475465478 | -0.05382756475465478 | None |
Notes on Target Output
children
name
value [family name]. E.g., ‘Falconer Family.name
value from each respective children
.children
entity_id
value should be mapped to this column.children
have identical time_weighted_return
, time-weighted_return2
and value
columns which should be mapped respectively.children
_custom_twr_audit_note_911318
values are currently blank, but will be utilized in the future.Current Output
My main issue is that you can see that I have only been able to tap into the 1st [Family] and 2nd [Wealth Bucket] children
level. This leaves me missing the 3rd [Asset Class] and 4th [Fund] -
portfolio | name | Adjusted Value (No Div, USD) | Current Quarter TWR (USD) | YTD TWR (USD) | TWR Audit Note) | |
---|---|---|---|---|---|---|
0 | Falconer Family | Falconer Family | 2.313249e+07 | -0.046732 | -0.046732 | None |
1 | Falconer Family | Wealth Bucket A | 1.326445e+07 | -0.045960 | -0.045960 | None |
2 | Falconer Family | Wealth Bucket B | 9.868044e+06 | -0.047699 | -0.047699 | None |
Current code
This is a function which gets me the correct df
formatting, however my main issue is that I haven't been able to find a solution to returning all children, but rather only the top-level -
# Function to read API response / JSON Object
def response_writer():
with open('api_response_2022-02-13.json') as f:
api_response = json.load(f)
return api_response
# Function to unpack JSON response into pandas dataframe.
def unpack_response():
while True:
try:
api_response = response_writer()
portfolio_views_children = api_response['data']['attributes']['total']['children']
portfolios = []
for portfolio in portfolio_views_children:
entity_columns = []
# include portfolio itself within an iterable so the total is the header
for entity in itertools.chain([portfolio], portfolio["children"]):
entity_data = entity["columns"].copy() # don't mutate original response
entity_data["portfolio"] = portfolio["name"] # from outer
entity_data["name"] = entity["name"]
entity_columns.append(entity_data)
df = pd.DataFrame(entity_columns)
portfolios.append(df)
# combine dataframes
df = pd.concat(portfolios)
# reorder and rename
column_ordering = {"portfolio": "portfolio", "name": "name"}
column_ordering.update({c["key"]: c["display_name"] for c in api_response["meta"]["columns"]})
df = df[column_ordering.keys()] # beware: un-named cols will be dropped
df = df.rename(columns=column_ordering)
break
except KeyError:
print("-----------------------------------\n","API TIMEOUT ERROR: TRY AGAIN...", "\n-----------------------------------\n")
return df
unpack_response()
Help
In short, I am looking for some advice on how I can tap into the remaining children
by enhancing the existing code. Whilst I have taken much time to fully explain my problem, please ask if anything isn't clear. Please note that the JSON may have multiple families, so the solution / advice offered must observe this
ANSWER
Answered 2022-Feb-16 at 06:41I think this gets you pretty close; might just need to adjust the various name
columns and drop the extra data (I kept the grouping
column).
The main idea is to recursively use pd.json_normalize with pd.concat for all availalable children
levels.
EDIT: Put everything into a single function and added section to collapse the name
columns like the expected output.
def process_json(api_response):
def get_column_values(df):
return pd.concat([df, pd.json_normalize(df.pop('columns')).set_axis(df.index)], axis=1)
def expand_children(df):
if len(df.index) > 1:
df['children'] = df['children'].fillna('').apply(lambda x: None if len(x) == 0 else x)
df_children = df.pop('children').dropna().explode()
if len(df_children.index) == 0: # return df if no children to append
return df.index.names, df
df_children = pd.json_normalize(df_children, max_level=0).set_axis(df_children.index).set_index('name', append=True)
df_children = get_column_values(df_children)
idx_names = list(df_children.index.names)
idx_names[-1] = idx_names[-1] + '_' + str(len(idx_names))
df[idx_names[-1]] = None
return idx_names, pd.concat([df.set_index(idx_names[-1], append=True), df_children], axis=0)
columns_dict = pd.DataFrame(api_response['meta']['columns']).set_index('key').to_dict(orient='index') # save column definitions
df = pd.DataFrame(api_response['data']['attributes']['total']['children']).set_index('name') # get initial dataframe
df = get_column_values(df) # get columns for initial level
# expand children
while 'children' in df.columns:
idx_names, df = expand_children(df)
# reorder/replace column headers and sort index
df = (df.loc[:, [x for x in df.columns if x not in columns_dict.keys()] + list(columns_dict.keys())]
.rename(columns={k:v['display_name'] for k,v in columns_dict.items()})
.sort_index(na_position='first').reset_index())
#collapse "name" columns (careful of potential duplicate rows)
for col in idx_names[::-1]:
df[idx_names[-1]] = df[idx_names[-1]].fillna(df[col])
df = df.rename(columns={'name': 'portfolio', idx_names[-1]: 'name'}).drop(columns=idx_names[1:-1])
return df
Since the other answer uses iterrows
, which usually isn't advised, figured a quick time compare was worthwhile.
process_json(api_response)
54.2 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
unpack_response(api_response) # iterrows
84.3 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
QUESTION
AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks'>
Asked 2022-Feb-25 at 13:18I was using pyspark on AWS EMR (4 r5.xlarge as 4 workers, each has one executor and 4 cores), and I got AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks'
. Below is a snippet of the code that threw this error:
search = SearchEngine(db_file_dir = "/tmp/db")
conn = sqlite3.connect("/tmp/db/simple_db.sqlite")
pdf_ = pd.read_sql_query('''select zipcode, lat, lng,
bounds_west, bounds_east, bounds_north, bounds_south from
simple_zipcode''',conn)
brd_pdf = spark.sparkContext.broadcast(pdf_)
conn.close()
@udf('string')
def get_zip_b(lat, lng):
pdf = brd_pdf.value
out = pdf[(np.array(pdf["bounds_north"]) >= lat) &
(np.array(pdf["bounds_south"]) <= lat) &
(np.array(pdf['bounds_west']) <= lng) &
(np.array(pdf['bounds_east']) >= lng) ]
if len(out):
min_index = np.argmin( (np.array(out["lat"]) - lat)**2 + (np.array(out["lng"]) - lng)**2)
zip_ = str(out["zipcode"].iloc[min_index])
else:
zip_ = 'bad'
return zip_
df = df.withColumn('zipcode', get_zip_b(col("latitude"),col("longitude")))
Below is the traceback, where line 102, in get_zip_b refers to pdf = brd_pdf.value
:
21/08/02 06:18:19 WARN TaskSetManager: Lost task 12.0 in stage 7.0 (TID 1814, ip-10-22-17-94.pclc0.merkle.local, executor 6): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/worker.py", line 605, in main
process()
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/worker.py", line 597, in process
serializer.dump_stream(out_iter, outfile)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/serializers.py", line 223, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/serializers.py", line 141, in dump_stream
for obj in iterator:
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/serializers.py", line 212, in _batched
for item in iterator:
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/worker.py", line 450, in mapper
result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/worker.py", line 450, in <genexpr>
result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/worker.py", line 90, in <lambda>
return lambda *a: f(*a)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/util.py", line 121, in wrapper
return f(*args, **kwargs)
File "/mnt/var/lib/hadoop/steps/s-1IBFS0SYWA19Z/Mobile_ID_process_center.py", line 102, in get_zip_b
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/broadcast.py", line 146, in value
self._value = self.load_from_path(self._path)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/broadcast.py", line 123, in load_from_path
return self.load(f)
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/broadcast.py", line 129, in load
return pickle.load(file)
AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/mnt/miniconda/lib/python3.9/site-packages/pandas/core/internals/blocks.py'>
Some observations and thought process:
1, After doing some search online, the AttributeError in pyspark seems to be caused by mismatched pandas versions between driver and workers?
2, But I ran the same code on two different datasets, one worked without any errors but the other didn't, which seems very strange and undeterministic, and it seems like the errors may not be caused by mismatched pandas versions. Otherwise, neither two datasets would succeed.
3, I then ran the same code on the successful dataset again, but this time with different spark configurations: setting spark.driver.memory from 2048M to 4192m, and it threw AttributeError.
4, In conclusion, I think the AttributeError has something to do with driver. But I can't tell how they are related from the error message, and how to fix it: AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks'.
ANSWER
Answered 2021-Aug-26 at 14:53I had the same error using pandas 1.3.2 in the server while 1.2 in my client. Downgrading pandas to 1.2 solved the problem.
QUESTION
How to update pandas DataFrame.drop() for Future Warning - all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only
Asked 2022-Feb-13 at 19:56The following code:
df = df.drop('market', 1)
generates the warning:
FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only
market
is the column we want to drop, and we pass the 1
as a second parameter for axis (0 for index, 1 for columns, so we pass 1).
How can we change this line of code now so that it is not a problem in the future version of pandas / to resolve the warning message now?
ANSWER
Answered 2022-Feb-13 at 19:56From the documentation, pandas.DataFrame.drop
has the following parameters:
Parameters
labels: single label or list-like Index or column labels to drop.
axis: {0 or ‘index’, 1 or ‘columns’}, default 0 Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
index: single label or list-like Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
columns: single label or list-like Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
level: int or level name, optional For MultiIndex, level from which the labels will be removed.
inplace: bool, default False If False, return a copy. Otherwise, do operation inplace and return None.
errors: {‘ignore’, ‘raise’}, default ‘raise’ If ‘ignore’, suppress error and only existing labels are dropped.
Moving forward, only labels
(the first parameter) can be positional.
So, for this example, the drop
code should be as follows:
df = df.drop('market', axis=1)
or (more legibly) with columns
:
df = df.drop(columns='market')
QUESTION
Cannot set up a conda environment with python 3.10
Asked 2022-Jan-31 at 10:35I am trying to set up a conda environment with python 3.10 installed. For some reason, no install commands for additional packages are working. For example, if I run conda install pandas
, I get the error:
PackagesNotFoundError: The following packages are not available from current channels:
- python=3.1
conda install -c conda-forge pandas
doesn't work either. Not sure what the problem is.
ANSWER
Answered 2021-Oct-08 at 08:42Thats a bug in conda, you can read more about it here: https://github.com/conda/conda/issues/10969
Right now there is a PR to fix it but its not a released version. For now, just stick with
conda install python=3.9
QUESTION
ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'
Asked 2022-Jan-12 at 23:01I have this output :
[Pandas-profiling] ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'
when trying to import pandas-profiling in this fashion :
from pandas_profiling import ProfileReport
It seems to import pandas-profiling correctly but struggles when it comes to interfacing with pandas itself. Both libraries are currently up to date through conda. It doesn't seem to match any of the common problems associated with pandas-profiling as per their documentation, and I can't seem to locate a more general solution of importing the name ABCIndexClass
.
Thanks
ANSWER
Answered 2021-Aug-09 at 19:19Pandas v1.3 renamed the ABCIndexClass
to ABCIndex
. The visions
dependency of the pandas-profiling
package hasn't caught up yet, and so throws an error when it can't find ABCIndexClass
. Downgrading pandas to the 1.2.x series will resolve the issue.
Alternatively, you can just wait for the visions
package to be updated.
QUESTION
Merge two pandas DataFrame based on partial match
Asked 2022-Jan-06 at 00:54Two DataFrames have city names that are not formatted the same way. I'd like to do a Left-outer join and pull geo
field for all partial string matches between the field City
in both DataFrames.
import pandas as pd
df1 = pd.DataFrame({
'City': ['San Francisco, CA','Oakland, CA'],
'Val': [1,2]
})
df2 = pd.DataFrame({
'City': ['San Francisco-Oakland, CA','Salinas, CA'],
'Geo': ['geo1','geo2']
})
Expected DataFrame
upon join:
City Val Geo
San Francisco, CA 1 geo1
Oakland, CA 2 geo1
ANSWER
Answered 2021-Sep-12 at 20:24This should do the job. String match with Levenshtein_distance.
pip install thefuzz[speedup]
import pandas as pd
import numpy as np
from thefuzz import process
def fuzzy_match(
a: pd.DataFrame, b: pd.DataFrame, col: str, limit: int = 5, thresh: int = 80
):
"""use fuzzy matching to join on column"""
s = b[col].tolist()
matches = a[col].apply(lambda x: process.extract(x, s, limit=limit))
matches = pd.DataFrame(np.concatenate(matches), columns=["match", "score"])
# join other columns in b to matches
to_join = (
pd.merge(left=b, right=matches, how="right", left_on="City", right_on="match")
.set_index( # create an index that represents the matching row in df a, you can drop this when `limit=1`
np.array(
list(
np.repeat(i, limit if limit < len(b) else len(b))
for i in range(len(a))
)
).flatten()
)
.drop(columns=["match"])
.astype({"score": "int16"})
)
print(f"\t the index here represents the row in dataframe a on which to join")
print(to_join)
res = pd.merge(
left=a, right=to_join, left_index=True, right_index=True, suffixes=("", "_b")
)
# return only the highest match or you can just set the limit to 1
# and remove this
df = res.reset_index()
df = df.iloc[df.groupby(by="index")["score"].idxmax()].reset_index(drop=True)
return df.drop(columns=["City_b", "score", "index"])
def test(df):
expected = pd.DataFrame(
{
"City": ["San Francisco, CA", "Oakland, CA"],
"Val": [1, 2],
"Geo": ["geo1", "geo1"],
}
)
print(f'{"expected":-^70}')
print(expected)
print(f'{"res":-^70}')
print(df)
assert expected.equals(df)
if __name__ == "__main__":
a = pd.DataFrame({"City": ["San Francisco, CA", "Oakland, CA"], "Val": [1, 2]})
b = pd.DataFrame(
{"City": ["San Francisco-Oakland, CA", "Salinas, CA"], "Geo": ["geo1", "geo2"]}
)
print(f'\n\n{"fuzzy match":-^70}')
res = fuzzy_match(a, b, col="City")
test(res)
QUESTION
Create a new column in a Pandas DataFrame from existing column names
Asked 2021-Nov-15 at 00:22I want to deconstruct a pandas DataFrame, using column headers as a new data-column and create a list with all combinations of the row index and columns. Easier to show than explain:
index_col = ["store1", "store2", "store3"]
cols = ["January", "February", "March"]
values = [[2,3,4],[5,6,7],[8,9,10]]
df = pd.DataFrame(values, index=index_col, columns=cols)
From this DataFrame I wish to get the following list:
[['store1', 'January', 2],
['store1', 'February', 3],
['store1', 'March', 4],
['store2', 'January', 5],
['store2', 'February', 6],
['store2', 'March', 7],
['store3', 'January', 8],
['store3', 'February', 9],
['store3', 'March', 10]]
Is there a convenient way to do this?
ANSWER
Answered 2021-Nov-09 at 23:58The structure that you want your data in is very messy, so this is probably the best method given the data you want.
# Results
res = []
# Nested loop: first for length of index col, then next for cols
for i in range(len(index_col)):
for j in range(len(cols)):
# Format of data
res.append([index_col[i], cols[j], values[i][j]])
# Return results
print(res)
return res
QUESTION
After conda update, python kernel crashes when matplotlib is used
Asked 2021-Nov-06 at 19:03I have create this simple env with conda
:
conda create -n test python=3.8.5 pandas scipy numpy matplotlib seaborn jupyterlab
The following code in jupyter lab
crashes the kernel :
import matplotlib.pyplot as plt
plt.subplot()
I don't face the problem on Linux
. The problem is when I try on Windows 10
.
There are no errors on the jupyter lab
console (where I started the server), and I have no idea where to investigate.
ANSWER
Answered 2021-Nov-06 at 19:03pkgs/main
channel for conda
has reverted to using freetype 2.10.4
for Windows, per main / packages / freetype.conda list freetype
to check the version: freetype != 2.11.0
conda update --all
(providing your default channel isn't changed in the .condarc
config file).conda
or freetype
since Oct 27, 2021.Anaconda
prompt and downgrade freetype 2.11.0
in any affected environment.
conda install freetype=2.10.4
matplotlib
and any IDE
pandas.DataFrame.plot
and seaborn
conda
, released Friday, Oct 29.conda update --all
, there's an issue with anything related to matplotlib
in any IDE (not just Jupyter
).
JupyterLab
, PyCharm
, and python
from the command prompt.Process finished with exit code -1073741819
conda update --all
in (base)
, then any plot API that uses matplotlib
(e.g. seaborn
and pandas.DataFrame.plot
) kills the kernel in any environment.(base)
, then my other environments worked.python 3.8.12
and python 3.9.7
conda
revision log.conda update --all
this environment was working, but after the updates, plotting with matplotlib
crashes the python kernel 2021-10-31 10:47:22 (rev 3)
bokeh {2.3.3 (defaults/win-64) -> 2.4.1 (defaults/win-64)}
click {8.0.1 (defaults/noarch) -> 8.0.3 (defaults/noarch)}
filelock {3.0.12 (defaults/noarch) -> 3.3.1 (defaults/noarch)}
freetype {2.10.4 (defaults/win-64) -> 2.11.0 (defaults/win-64)}
imagecodecs {2021.6.8 (defaults/win-64) -> 2021.8.26 (defaults/win-64)}
joblib {1.0.1 (defaults/noarch) -> 1.1.0 (defaults/noarch)}
lerc {2.2.1 (defaults/win-64) -> 3.0 (defaults/win-64)}
more-itertools {8.8.0 (defaults/noarch) -> 8.10.0 (defaults/noarch)}
pyopenssl {20.0.1 (defaults/noarch) -> 21.0.0 (defaults/noarch)}
scikit-learn {0.24.2 (defaults/win-64) -> 1.0.1 (defaults/win-64)}
statsmodels {0.12.2 (defaults/win-64) -> 0.13.0 (defaults/win-64)}
sympy {1.8 (defaults/win-64) -> 1.9 (defaults/win-64)}
tqdm {4.62.2 (defaults/noarch) -> 4.62.3 (defaults/noarch)}
xlwings {0.24.7 (defaults/win-64) -> 0.24.9 (defaults/win-64)}
freetype
2.11.0
to 2.10.4
resolved the issue and made the environment work with matplotlib
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
PyPI
pip install pandas
HTTPS
https://github.com/pandas-dev/pandas.git
CLI
gh repo clone pandas-dev/pandas
SSH
git@github.com:pandas-dev/pandas.git
Share this Page
See Similar Libraries in
by public-apis
by donnemartin
by TheAlgorithms
by jackfrued
by ytdl-org
See all Python Libraries
by pandas-dev Python
by pandas-dev Python
by pandas-dev Python
by pandas-dev Jupyter Notebook
by pandas-dev Python
See all Libraries by this author
by home-assistant
by ytdl-org
by scikit-learn
by tensorflow
by tiangolo
See all Python Libraries
by pmp-p
by terryh
by dvj
by nflath
by prabhu
See all Python Libraries
by pmp-p
by dvj
by cortesi
by nflath
by prabhu
See all Python Libraries
by pmp-p
by prabhu
by nathanborror
by ilblackdragon
by lehrblogger
See all Python Libraries
by pinax
by bradmontgomery
by Debakel
by manjitkumar
by gatagat
See all Python Libraries
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source