By continuing you indicate that you have read and agree to our Terms of service and Privacy policy
 by  statsmodels Python Version: 0.13.5 License: BSD-3-Clause
 by  statsmodels Python Version: 0.13.5 License: BSD-3-Clause
Support
Quality
Security
License
Reuse
kandi has reviewed statsmodels and discovered the below as its top functions. This is intended to give you an instant insight into statsmodels implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Statsmodels: statistical modeling and econometrics in Python
See all related Code Snippets
QUESTION
How to install Python statsmodels on Apple M1?
Asked 2022-Mar-22 at 15:53I cannot figure out how to install statsmodels on my M1 machine. After following the instructions in similar threads about scipy and numpy issues with M1, I am able to install these, but cannot install statsmodels.
Statsmodels issues were also raised here, but unresolved: https://github.com/scipy/scipy/issues/13409
python --version Python 3.8.9
pip --version pip 21.3.1
The command pip install statsmodels==0.13.1
leads to the error message:
ERROR: Could not find a version that satisfies the requirement statsmodels==0.13.1
Has anyone managed to install it?
Thank you!
ANSWER
Answered 2021-Dec-29 at 19:24Have you tried installing it from conda-forge
? This package page shows that there's a osx-arm64
version available.
You can install Miniforge (or Mambaforge, if you prefer) for Apple Silicon platforms from the conda-forge GitHub repo. Then just follow the installation instructions and create an environment to install statsmodels into.
Unfortunately, I don't have an M1 machine so I can't test if it's working.
QUESTION
How to install local package with conda
Asked 2022-Feb-05 at 04:16I have a local python project called jive
that I would like to use in an another project. My current method of using jive
in other projects is to activate the conda env for the project, then move to my jive
directory and use python setup.py install
. This works fine, and when I use conda list
, I see everything installed in the env including jive
, with a note that jive
was installed using pip.
But what I really want is to do this with full conda. When I want to use jive
in another project, I want to just put jive
in that projects environment.yml
.
So I did the following:
meta.yaml
so I could use conda-build to build jive
locallyconda build .
jive
source as expectedenvironment.yml
, and add 'local' to the list of channels.When I activate the environment and use conda list
, it lists all the dependencies including jive
, as desired. But when I open python interpreter, I cannot import jive
, it says there is no such package. (If use python setup.py install
, I can import it.)
How can I fix the build/install so that this works?
Here is the meta.yaml, which lives in the jive
project top level directory:
package:
name: jive
version: "0.2.1"
source:
path: .
build:
script: python -m pip install --no-deps --ignore-installed .
requirements:
host:
- python>=3.5
- pip
- setuptools
run:
- python>=3.5
- numpy
- pandas
- scipy
- seaborn
- matplotlib
- scikit-learn
- statsmodels
- joblib
- bokeh
test:
imports: jive
And here is the output of conda build .
No numpy version specified in conda_build_config.yaml. Falling back to default numpy value of 1.16
WARNING:conda_build.metadata:No numpy version specified in conda_build_config.yaml. Falling back to default numpy value of 1.16
Adding in variants from internal_defaults
INFO:conda_build.variants:Adding in variants from internal_defaults
Adding in variants from /Users/thomaskeefe/.conda/conda_build_config.yaml
INFO:conda_build.variants:Adding in variants from /Users/thomaskeefe/.conda/conda_build_config.yaml
Attempting to finalize metadata for jive
INFO:conda_build.metadata:Attempting to finalize metadata for jive
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
BUILD START: ['jive-0.2.1-py310_0.tar.bz2']
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: /opt/miniconda3/conda-bld/jive_1642185595622/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla
The following NEW packages will be INSTALLED:
bzip2: 1.0.8-h1de35cc_0
ca-certificates: 2021.10.26-hecd8cb5_2
certifi: 2021.5.30-py310hecd8cb5_0
libcxx: 12.0.0-h2f01273_0
libffi: 3.3-hb1e8313_2
ncurses: 6.3-hca72f7f_2
openssl: 1.1.1m-hca72f7f_0
pip: 21.2.4-py310hecd8cb5_0
python: 3.10.0-hdfd78df_3
readline: 8.1.2-hca72f7f_1
setuptools: 58.0.4-py310hecd8cb5_0
sqlite: 3.37.0-h707629a_0
tk: 8.6.11-h7bc2e8c_0
tzdata: 2021e-hda174b7_0
wheel: 0.37.1-pyhd3eb1b0_0
xz: 5.2.5-h1de35cc_0
zlib: 1.2.11-h4dc903c_4
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
Copying /Users/thomaskeefe/Documents/py_jive to /opt/miniconda3/conda-bld/jive_1642185595622/work/
source tree in: /opt/miniconda3/conda-bld/jive_1642185595622/work
export PREFIX=/opt/miniconda3/conda-bld/jive_1642185595622/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla
export BUILD_PREFIX=/opt/miniconda3/conda-bld/jive_1642185595622/_build_env
export SRC_DIR=/opt/miniconda3/conda-bld/jive_1642185595622/work
Processing $SRC_DIR
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
Building wheels for collected packages: jive
Building wheel for jive (setup.py): started
Building wheel for jive (setup.py): finished with status 'done'
Created wheel for jive: filename=jive-0.2.1-py3-none-any.whl size=46071 sha256=b312955cb2fd917bc4e684a575407b884190680f2dddad7fcb9ac25e5b290fc9
Stored in directory: /private/tmp/pip-ephem-wheel-cache-rbpkt2an/wheels/15/68/82/4ed7cd246fbc4c72cf764b425a03230247589bd2394a7e457b
Successfully built jive
Installing collected packages: jive
Successfully installed jive-0.2.1
Resource usage statistics from building jive:
Process count: 3
CPU time: Sys=0:00:00.3, User=0:00:00.5
Memory: 53.7M
Disk usage: 50.4K
Time elapsed: 0:00:06.1
Packaging jive
INFO:conda_build.build:Packaging jive
INFO conda_build.build:build(2289): Packaging jive
Packaging jive-0.2.1-py310_0
INFO:conda_build.build:Packaging jive-0.2.1-py310_0
INFO conda_build.build:bundle_conda(1529): Packaging jive-0.2.1-py310_0
compiling .pyc files...
number of files: 70
Fixing permissions
INFO :: Time taken to mark (prefix)
0 replacements in 0 files was 0.06 seconds
TEST START: /opt/miniconda3/conda-bld/osx-64/jive-0.2.1-py310_0.tar.bz2
Adding in variants from /var/folders/dd/t85p2jdn3sd11bsdnl7th6p00000gn/T/tmp4o3im7d1/info/recipe/conda_build_config.yaml
INFO:conda_build.variants:Adding in variants from /var/folders/dd/t85p2jdn3sd11bsdnl7th6p00000gn/T/tmp4o3im7d1/info/recipe/conda_build_config.yaml
INFO conda_build.variants:_combine_spec_dictionaries(234): Adding in variants from /var/folders/dd/t85p2jdn3sd11bsdnl7th6p00000gn/T/tmp4o3im7d1/info/recipe/conda_build_config.yaml
Renaming work directory '/opt/miniconda3/conda-bld/jive_1642185595622/work' to '/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64'
INFO:conda_build.utils:Renaming work directory '/opt/miniconda3/conda-bld/jive_1642185595622/work' to '/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64'
INFO conda_build.utils:shutil_move_more_retrying(2091): Renaming work directory '/opt/miniconda3/conda-bld/jive_1642185595622/work' to '/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64'
shutil.move(work)=/opt/miniconda3/conda-bld/jive_1642185595622/work, dest=/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64)
INFO:conda_build.utils:shutil.move(work)=/opt/miniconda3/conda-bld/jive_1642185595622/work, dest=/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64)
INFO conda_build.utils:shutil_move_more_retrying(2098): shutil.move(work)=/opt/miniconda3/conda-bld/jive_1642185595622/work, dest=/opt/miniconda3/conda-bld/jive_1642185595622/work_moved_jive-0.2.1-py310_0_osx-64)
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol
The following NEW packages will be INSTALLED:
blas: 1.0-mkl
bokeh: 2.4.2-py39hecd8cb5_0
bottleneck: 1.3.2-py39he3068b8_1
brotli: 1.0.9-hb1e8313_2
ca-certificates: 2021.10.26-hecd8cb5_2
certifi: 2021.10.8-py39hecd8cb5_2
cycler: 0.11.0-pyhd3eb1b0_0
fonttools: 4.25.0-pyhd3eb1b0_0
freetype: 2.11.0-hd8bbffd_0
giflib: 5.2.1-haf1e3a3_0
intel-openmp: 2021.4.0-hecd8cb5_3538
jinja2: 3.0.2-pyhd3eb1b0_0
jive: 0.2.1-py310_0 local
joblib: 1.1.0-pyhd3eb1b0_0
jpeg: 9d-h9ed2024_0
kiwisolver: 1.3.1-py39h23ab428_0
lcms2: 2.12-hf1fd2bf_0
libcxx: 12.0.0-h2f01273_0
libffi: 3.3-hb1e8313_2
libgfortran: 3.0.1-h93005f0_2
libpng: 1.6.37-ha441bb4_0
libtiff: 4.2.0-h87d7836_0
libwebp: 1.2.0-hacca55c_0
libwebp-base: 1.2.0-h9ed2024_0
llvm-openmp: 12.0.0-h0dcd299_1
lz4-c: 1.9.3-h23ab428_1
markupsafe: 2.0.1-py39h9ed2024_0
matplotlib: 3.5.0-py39hecd8cb5_0
matplotlib-base: 3.5.0-py39h4f681db_0
mkl: 2021.4.0-hecd8cb5_637
mkl-service: 2.4.0-py39h9ed2024_0
mkl_fft: 1.3.1-py39h4ab4a9b_0
mkl_random: 1.2.2-py39hb2f4e1b_0
munkres: 1.1.4-py_0
ncurses: 6.3-hca72f7f_2
numexpr: 2.8.1-py39h2e5f0a9_0
numpy: 1.21.2-py39h4b4dc7a_0
numpy-base: 1.21.2-py39he0bd621_0
olefile: 0.46-pyhd3eb1b0_0
openssl: 1.1.1m-hca72f7f_0
packaging: 21.3-pyhd3eb1b0_0
pandas: 1.3.5-py39h743cdd8_0
patsy: 0.5.2-py39hecd8cb5_0
pillow: 8.4.0-py39h98e4679_0
pip: 21.2.4-py39hecd8cb5_0
pyparsing: 3.0.4-pyhd3eb1b0_0
python: 3.9.7-h88f2d9e_1
python-dateutil: 2.8.2-pyhd3eb1b0_0
pytz: 2021.3-pyhd3eb1b0_0
pyyaml: 6.0-py39hca72f7f_1
readline: 8.1.2-hca72f7f_1
scikit-learn: 1.0.2-py39hae1ba45_0
scipy: 1.7.3-py39h8c7af03_0
seaborn: 0.11.2-pyhd3eb1b0_0
setuptools: 58.0.4-py39hecd8cb5_0
six: 1.16.0-pyhd3eb1b0_0
sqlite: 3.37.0-h707629a_0
statsmodels: 0.13.0-py39hca72f7f_0
threadpoolctl: 2.2.0-pyh0d69192_0
tk: 8.6.11-h7bc2e8c_0
tornado: 6.1-py39h9ed2024_0
typing_extensions: 3.10.0.2-pyh06a4308_0
tzdata: 2021e-hda174b7_0
wheel: 0.37.1-pyhd3eb1b0_0
xz: 5.2.5-h1de35cc_0
yaml: 0.2.5-haf1e3a3_0
zlib: 1.2.11-h4dc903c_4
zstd: 1.4.9-h322a384_0
Preparing transaction: ...working... done
Verifying transaction: ...working...
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::intel-openmp-2021.4.0-hecd8cb5_3538, defaults/osx-64::llvm-openmp-12.0.0-h0dcd299_1
path: 'lib/libiomp5.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'bin/webpinfo'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'bin/webpmux'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'include/webp/decode.h'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'include/webp/encode.h'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'include/webp/mux.h'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'include/webp/mux_types.h'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'include/webp/types.h'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebp.7.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebp.a'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebp.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpdecoder.3.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpdecoder.a'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpdecoder.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpmux.3.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpmux.a'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/libwebpmux.dylib'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/pkgconfig/libwebp.pc'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/pkgconfig/libwebpdecoder.pc'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'lib/pkgconfig/libwebpmux.pc'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'share/man/man1/cwebp.1'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'share/man/man1/dwebp.1'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'share/man/man1/webpinfo.1'
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: defaults/osx-64::libwebp-base-1.2.0-h9ed2024_0, defaults/osx-64::libwebp-1.2.0-hacca55c_0
path: 'share/man/man1/webpmux.1'
done
Executing transaction: ...working...
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/llvm-openmp-12.0.0-h0dcd299_1/lib/libiomp5.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libiomp5.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/bin/webpinfo
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/bin/webpinfo
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/bin/webpmux
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/bin/webpmux
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/include/webp/decode.h
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/webp/decode.h
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/include/webp/encode.h
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/webp/encode.h
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/include/webp/mux.h
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/webp/mux.h
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/include/webp/mux_types.h
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/webp/mux_types.h
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/include/webp/types.h
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/webp/types.h
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebp.7.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebp.7.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebp.a
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebp.a
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebp.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebp.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpdecoder.3.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpdecoder.3.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpdecoder.a
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpdecoder.a
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpdecoder.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpdecoder.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpmux.3.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpmux.3.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpmux.a
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpmux.a
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/lib/libwebpmux.dylib
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libwebpmux.dylib
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/.condatmp/1018f8ab-87a7-4fa8-a41c-4c14cc77cfff
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/pkgconfig/libwebp.pc
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/.condatmp/e3701fae-f2cd-44e9-9dc6-c71f499cd2c2
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/pkgconfig/libwebpdecoder.pc
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/.condatmp/0f4bcf50-01e5-404d-b1a4-8a87d45c22c5
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/pkgconfig/libwebpmux.pc
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/share/man/man1/cwebp.1
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share/man/man1/cwebp.1
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/share/man/man1/dwebp.1
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share/man/man1/dwebp.1
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/share/man/man1/webpinfo.1
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share/man/man1/webpinfo.1
ClobberWarning: Conda was asked to clobber an existing path.
source path: /opt/miniconda3/pkgs/libwebp-1.2.0-hacca55c_0/share/man/man1/webpmux.1
target path: /opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share/man/man1/webpmux.1
Installed package of scikit-learn can be accelerated using scikit-learn-intelex.
More details are available here: https://intel.github.io/scikit-learn-intelex
For example:
$ conda install scikit-learn-intelex
$ python -m sklearnex my_application.py
done
export PREFIX=/opt/miniconda3/conda-bld/jive_1642185595622/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol
export SRC_DIR=/opt/miniconda3/conda-bld/jive_1642185595622/test_tmp
Traceback (most recent call last):
File "/opt/miniconda3/conda-bld/jive_1642185595622/test_tmp/run_test.py", line 2, in <module>
import jive
ModuleNotFoundError: No module named 'jive'
import: 'jive'
Tests failed for jive-0.2.1-py310_0.tar.bz2 - moving package to /opt/miniconda3/conda-bld/broken
WARNING:conda_build.build:Tests failed for jive-0.2.1-py310_0.tar.bz2 - moving package to /opt/miniconda3/conda-bld/broken
WARNING conda_build.build:tests_failed(2970): Tests failed for jive-0.2.1-py310_0.tar.bz2 - moving package to /opt/miniconda3/conda-bld/broken
TESTS FAILED: jive-0.2.1-py310_0.tar.bz2
EDIT: I added a test:
section to the meta.yaml as merv suggested.
ANSWER
Answered 2022-Feb-05 at 04:16The immediate error is that the build is generating a Python 3.10 version, but when testing Conda doesn't recognize any constraint on the Python version, and creates a Python 3.9 environment.
I think the main issue is that python >=3.5
is only a valid constraint when doing noarch
builds, which this is not. That is, once a package builds with a given Python version, the version must be constrained to exactly that version (up through minor). So, in this case, the package is built with Python 3.10, but it reports in its metadata that it is compatible with all versions of Python 3.5+, which simply isn't true because Conda Python packages install the modules into Python-version-specific site-packages
(e.g., lib/python-3.10/site-packages/jive
).
Typically, Python versions are controlled by either the --python
argument given to conda-build
or a matrix supplied by the conda_build_config.yaml
file (see documentation on "Build variants").
Try adjusting the meta.yaml
to something like
package:
name: jive
version: "0.2.1"
source:
path: .
build:
script: python -m pip install --no-deps --ignore-installed .
requirements:
host:
- python
- pip
- setuptools
run:
- python
- numpy
- pandas
- scipy
- seaborn
- matplotlib
- scikit-learn
- statsmodels
- joblib
- bokeh
If you want to use it in a Python 3.9 environment, then use conda build --python 3.9 .
.
QUESTION
Including parameters in state space model from statsmodels
Asked 2022-Jan-03 at 16:00Building up the model from a previous post, and the helpful answer, I've subclassed the MLEModel to encapsulate the model. I'd like to allow for two parameters q1
and q2
so that the state noise covariance matrix is generalized as in Sarkka (2013)'s example 4.3 (terms re-arranged for my convention):
I thought I would accomplish this with the update
method below, but I'm running into problems with the fit
method, as it returns a UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('complex128') to dtype('float64') with casting rule 'same_kind'
. What am I missing here?
import numpy as np
import scipy.linalg as linalg
import statsmodels.api as sm
class Tracker2D(sm.tsa.statespace.MLEModel):
"""Position tracker in two dimensions with four states
"""
start_params = [1.0, 1.0]
param_names = ["q1", "q2"]
def __init__(self, endog):
super(Tracker2D, self).__init__(endog, k_states=4)
self.endog = endog
self._state_names = ["x1", "dx1/dt",
"x3", "dx3/dt"]
# dt: sampling rate; s = standard deviation of the process noise
# common to both dimensions
dt, s = 0.1, 0.5
# dynamic model matrices A and Q
A2d = [[1, dt],
[0, 1]]
A = linalg.block_diag(A2d, A2d)
Q2d = [[dt ** 3 / 3, dt ** 2 / 2],
[dt ** 2 / 2, dt]]
Q = linalg.block_diag(Q2d, Q2d)
# measurement model matrices H and R
H = np.array([[1, 0, 0, 0],
[0, 0, 1, 0]])
R = s ** 2 * np.eye(2)
self["design"] = H
self["obs_cov"] = R
self["transition"] = A
self["selection"] = np.eye(4)
self["state_cov"] = Q
def update(self, params, **kwargs):
self["state_cov", :2, :2] *= params[0]
self["state_cov", 2:, 2:] *= params[1]
# Initialization
m0 = np.array([[0, 1, 0, -1]]).T # state vector column vector
P0 = np.eye(4) # process covariance matrix
# With object Y below being the simulated measurements in downloadable
# data file from previous post
with open("measurements_2d.npy", "rb") as f:
Y = np.load(f)
tracker2D = Tracker2D(pd.DataFrame(Y.T))
tracker2D.initialize_known((tracker2D["transition"] @ m0.flatten()),
(tracker2D["transition"] @ P0 @
tracker2D["transition"].T +
tracker2D["state_cov"]))
# Below throws the error
tracker2D.fit()
ANSWER
Answered 2022-Jan-03 at 16:00The error message you are receiving is about trying to set a complex value in a dtype=float matrix. You would get the same error from:
A = np.eye(2)
A *= 1.0j
The error is showing up in:
def update(self, params, **kwargs):
self["state_cov", :2, :2] *= params[0]
self["state_cov", 2:, 2:] *= params[1]
because you are modifying the "state_cov" in place. When params
is a complex vector but the existing "state_cov" matrix has dtype float, then the error will occur. Statsmodels will set the parameter vector to be complex when computing the standard errors of the parameters, because it uses complex step differentiation.
You could use something like
def update(self, params, **kwargs):
self["state_cov", :2, :2] = params[0] * self["state_cov", :2, :2]
self["state_cov", 2:, 2:] = params[1] * self["state_cov", 2:, 2:]
Although I should point out that this will not give you what I think you want, because it will modify the "state_cov" based on whatever it previously was. I think instead, you want something like:
class Tracker2D(sm.tsa.statespace.MLEModel):
"""Position tracker in two dimensions with four states
"""
start_params = [1.0, 1.0]
param_names = ["q1", "q2"]
def __init__(self, endog):
super(Tracker2D, self).__init__(endog, k_states=4)
self.endog = endog
self._state_names = ["x1", "dx1/dt",
"x3", "dx3/dt"]
# dt: sampling rate; s = standard deviation of the process noise
# common to both dimensions
dt, s = 0.1, 0.5
# dynamic model matrices A and Q
A2d = [[1, dt],
[0, 1]]
A = linalg.block_diag(A2d, A2d)
Q2d = [[dt ** 3 / 3, dt ** 2 / 2],
[dt ** 2 / 2, dt]]
# First we save the base Q matrix
self.Q = linalg.block_diag(Q2d, Q2d)
# measurement model matrices H and R
H = np.array([[1, 0, 0, 0],
[0, 0, 1, 0]])
R = s ** 2 * np.eye(2)
self["design"] = H
self["obs_cov"] = R
self["transition"] = A
self["selection"] = np.eye(4)
self["state_cov"] = self.Q.copy()
def update(self, params, **kwargs):
# Now update the state cov based on the original Q
# matrix, and set entire blocks of the matrix, rather
# than modifying it in-place.
self["state_cov", :2, :2] = params[0] * self.Q[:2, :2]
self["state_cov", 2:, 2:] = params[1] * self.Q[2:, 2:]
QUESTION
what does {sys.executable} do in jupyter notebook?
Asked 2021-Dec-14 at 03:37I bought a book which comes with jupyter notebook. In the first chapter, it asks me to install required libraries. It use {sys.executable} -m. I never see it before. what does {sys.executable} and -m do? also why use --user at the end?
typically, I just use ! pip install numpy==1.19.2
Anyone can help me understand it? Thank you!
import sys
!{sys.executable} -m pip install numpy==1.19.2 --user
!{sys.executable} -m pip install scipy==1.6.2 --user
!{sys.executable} -m pip install tensorflow==2.4.0 --user
!{sys.executable} -m pip install tensorflow-probability==0.11.0 --user
!{sys.executable} -m pip install scikit-learn==0.24.1 --user
!{sys.executable} -m pip install statsmodels==0.12.2 --user
!{sys.executable} -m pip install ta --user
ANSWER
Answered 2021-Dec-14 at 03:35sys.executable
is refering to the Python interpreter for the current system. It comes handy when using virtual environments and have several interpreters on the same machine.
The -m
option loads and execute a module as a script, here pip
.
The --user
is an option for pip install
, see this answer describing its use.
Then the !{}
is jupyter-specific syntax to execute commands in a cell if I remember correctly.
QUESTION
LogisticRegression from sk_learn and smf.logit() from statsmodels.formula.api return different results
Asked 2021-Nov-28 at 14:38I am trying to calculate the variance of the coefficients for logistic regression using bootstrap and I am using scikit-learn and statsmodels to compare results. I am using the Default dataset from the ISLR website which can be found in the zip forlder here or here as a plain csv file. I am using the following codes to perform the bootstrap:
Import the Dataset and create the response variable
default_df = pd.read_csv("./Machine-Learning-Books-With-Python/Introduction to Statistical Learning/data/default.csv")
default_df['default_01'] = np.where(default_df.default == 'Yes', 1, 0)
Next, I am defining the boot function which will take care of the random sampling for my dataset:
def boot(X, bootSample_size=None):
'''
Sampling observations from a dataframe
Parameters
------------
X : pandas dataframe
Data to be resampled
bootSample_size: int, optional
Dimension of the bootstrapped samples
Returns
------------
bootSample_X : pandas dataframe
Resampled data
Examples
----------
To resample data from the X dataframe:
>> boot(X)
The resampled data will have length equal to len(X).
To resample data from the X dataframe in order to have length 5:
>> boot(X,5)
References
------------
http://nbviewer.jupyter.org/gist/aflaxman/6871948
'''
#assign default size if non-specified
if bootSample_size == None:
bootSample_size = len(X)
#create random integers to use as indices for bootstrap sample based on original data
bootSample_i = (np.random.rand(bootSample_size)*len(X)).astype(int)
bootSample_i = np.array(bootSample_i)
bootSample_X = X.iloc[bootSample_i]
return bootSample_X
Finally, I define two functions that will perform the logistic regression and extract the parameters:
Using statsmodels
def boot_fn(data):
lr = smf.logit(formula='default_01 ~ income + balance', data=data).fit(disp=0)
return lr.params
Using scikit-learn
def boot_fn2(data):
X = data[['income', 'balance']]
y = data.default_01
logit = LogisticRegression(C = 1e9)
logit.fit(X, y)
return logit.coef_
Finally the loop to run the functions 100 times and store the results:
coef_sk = []
coef_sm = []
for _ in np.arange(100):
data = boot(default_df)
coef_sk.append(boot_fn2(data))
coef_sm.append(boot_fn(data))
Taking the mean for both coef_sk (scikit-learn) and coef_sm (statsmodels) I see that the one generated using statsmodels is much closer to the real value and also for different runs the scikit-learn coefficients appear to diverge quite a lot from the actual value. Could you please explain why this happens? I find it very confusing since I would expect that for the same datasets, results should be the same (at least marginally different). However, in this case, results differ a lot, which leads me to believe that there is something wrong with the way I am running the sk-learn version. Would appreciate any help and I would be more than happy to provide additional clarifications.
ANSWER
Answered 2021-Nov-28 at 14:38Although you set the C parameter to be high to minimize, sklearn by default uses lbfgs
solver to find your optimal parameters while statsmodels uses newton
.
You can try doing this to get similar coefficients:
def boot_fn2(data):
X = data[['income', 'balance']]
y = data.default_01
logit = LogisticRegression(penalty="none",max_iter=1000,solver = "newton-cg")
logit.fit(X, y)
return logit.coef_
If I run this with the above function:
coef_sk = []
coef_sm = []
for _ in np.arange(50):
data = boot(default_df)
coef_sk.append(boot_fn2(data))
coef_sm.append(boot_fn(data))
You will instantly see that it throws a lot of warnings about being unable to converge:
LineSearchWarning: The line search algorithm did not converge
Although the coefficients are similar now, it points to a larger issue with your dataset, similar to this question
np.array(coef_sm)[:,1:].mean(axis=0)
array([2.14570133e-05, 5.68280785e-03])
np.array(coef_sk).mean(axis=0)
array([[2.14352318e-05, 5.68116402e-03]])
Your dependent variables are quite huge and this poses a problem for the optimization methods available in sklearn. You can just scale two of your dependent variables down if you want to interpret the coefficients:
default_df[['balance','income']] = default_df[['balance','income']]/100
Else it's always a good practice to scale your independent variables first and apply the regression:
from sklearn.preprocessing import StandardScaler
default_df[['balance','income']] = StandardScaler().fit_transform(default_df[['balance','income']])
def boot_fn(data):
lr = smf.logit(formula='default_01 ~ income + balance', data=data).fit(disp=0)
return lr.params
def boot_fn2(data):
X = data[['income', 'balance']]
y = data.default_01
logit = LogisticRegression(penalty="none")
logit.fit(X, y)
return logit.coef_
coef_sk = []
coef_sm = []
for _ in np.arange(50):
data = boot(default_df)
#print(data.default_01.mean())
coef_sk.append(boot_fn2(data))
coef_sm.append(boot_fn(data))
Now you'll see the coefficients are similar:
np.array(coef_sm)[:,1:].mean(axis=0)
array([0.26517582, 2.71598194])
np.array(coef_sk).mean(axis=0)
array([[0.26517504, 2.71598548]])
QUESTION
What is Julia's equivalent ggplot code of R's?
Asked 2021-Nov-23 at 15:47I would like to plot a sophisticated graph in Julia. The code below is in Julia's version using ggplot.
using CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs, RCall
@rlibrary ggplot2
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
sex=repeat(["male", "female"], inner=8),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
mod_uncentered = lm(@formula(weight ~ 1 + sex * age), growthdata)
refgrid = copy(growthdata)
filter!(refgrid) do row
return mod(row.age, 2) == (row.sex == "male")
end
effects!(refgrid, mod_uncentered)
refgrid[!, :lower] = @. refgrid.weight - 1.96 * refgrid.err
refgrid[!, :upper] = @. refgrid.weight + 1.96 * refgrid.err
df= refgrid
ggplot(df, aes(x=:age, y=:weight, group = :sex, shape= :sex, linetype=:sex)) +
geom_point(position=position_dodge(width=0.15)) +
geom_ribbon(aes(ymin=:lower, ymax=:upper), fill="gray", alpha=0.5)+
geom_line(position=position_dodge(width=0.15)) +
ylab("Weight")+ xlab("Age")+
theme_classic()
However, I would like to modify this graph a bit more. For example, I would like to change the scale of the y axis, the colors of the ribbon, add some error bars, and also change the text size of the legend and so on. Since I am new to Julia, I am not succeding in finding the equivalent language code for these modifications. Could someone help me translate this R code below of ggplot into Julia's language?
t1= filter(df, sex=="male") %>% slice_max(df$weight)
ggplot(df, aes(age, weight, group = sex, shape= sex, linetype=sex,fill=sex, colour=sex)) +
geom_line(position=position_dodge(width=0.15)) +
geom_point(position=position_dodge(width=0.15)) +
geom_errorbar(aes(ymin = lower, ymax = upper),width = 0.1,
linetype = "solid",position=position_dodge(width=0.15))+
geom_ribbon(aes(ymin = lower, ymax = upper, fill = sex, colour = sex), alpha = 0.2) +
geom_text(data = t1, aes(age, weight, label = round(weight, 1)), hjust = -0.25, size=7,show_guide = FALSE) +
scale_y_continuous(limits = c(70, 150), breaks = seq(80, 140, by = 20))+
theme_classic()+
scale_colour_manual(values = c("orange", "blue")) +
guides(color = guide_legend(override.aes = list(linetype = c('dotted', 'dashed'))),
linetype = "none")+
xlab("Age")+ ylab("Average marginal effects") + ggtitle("Title") +
theme(
axis.title.y = element_text(color="Black", size=28, face="bold", hjust = 0.9),
axis.text.y = element_text(face="bold", color="black", size=16),
plot.title = element_text(hjust = 0.5, color="Black", size=28, face="bold"),
legend.title = element_text(color = "Black", size = 13),
legend.text = element_text(color = "Black", size = 16),
legend.position="bottom",
axis.text.x = element_text(face="bold", color="black", size=11),
strip.text = element_text(face= "bold", size=15)
)
ANSWER
Answered 2021-Nov-23 at 15:47I used Vega-Lite (https://github.com/queryverse/VegaLite.jl) which is also grounded in the "Grammar of Graphics", and LinearRegression (https://github.com/ericqu/LinearRegression.jl) which provides similar features as GLM, although I think it is possible to get comparable results with the other plotting and linear regression packages. Nevertheless, I hope that this gives you a starting point.
using LinearRegression: Distributions, DataFrames, CategoricalArrays
using DataFrames, StatsModels, LinearRegression
using VegaLite
growthdata = DataFrame(; age=[13:20; 13:20],
sex=categorical(repeat(["male", "female"], inner=8), compress=true),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(16))
lm = regress(@formula(weight ~ 1 + sex * age), growthdata)
results = predict_in_sample(lm, growthdata, req_stats="all")
fp = select(results, [:age, :weight, :sex, :uclp, :lclp, :predicted]) |> @vlplot() +
@vlplot(
mark = :errorband, color = :sex,
y = { field = :uclp, type = :quantitative, title="Average marginal effects"},
y2 = { field = :lclp, type = :quantitative },
x = {:age, type = :quantitative} ) +
@vlplot(
mark = :line, color = :sex,
x = {:age, type = :quantitative},
y = {:predicted, type = :quantitative}) +
@vlplot(
:point, color=:sex ,
x = {:age, type = :quantitative, axis = {grid = false}, scale = {zero = false}},
y = {:weight, type = :quantitative, axis = {grid = false}, scale = {zero = false}},
title = "Title", width = 400 , height = 400
)
which gives:
You can change the style of the elements by changing the "config" as indicated here (https://www.queryverse.org/VegaLite.jl/stable/gettingstarted/tutorial/#Config-1).
As the Julia Vega-Lite is a wrapper to Vega-Lite additional documentation can be found on the Vega-lite website (https://vega.github.io/vega-lite/)
QUESTION
how to predict using statsmodels.formula.api logit
Asked 2021-Nov-04 at 15:25I have the following problem. I would like to do an in-sample prediction using logit
from statsmodels.formula.api
.
See my code:
import statsmodels.formula.api as smf
model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)
Until now everything's fine. But I would like to do in-sample prediction using my model:
yhat5 = model5_logit.predict(params=["dep", "var1", "var2", "var3"])
Which gives an error ValueError: data type must provide an itemsize
.
When I try:
yhat5 = model5_logit.predict(params="dep ~ var1 + var2 + var3")
I got another error: numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U69')) -> None
How can I do in-sample forecast for the Logit model using from statsmodels.formula.api
?
This did not help me: How to predict new values using statsmodels.formula.api (python)
ANSWER
Answered 2021-Nov-04 at 15:25Using an example dataset:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
from sklearn.datasets import make_classification
X,y = make_classification(n_features=3,n_informative=2,n_redundant=1)
model_data = pd.DataFrame(X,columns = ['var1','var2','var3'])
model_data['dep'] = y
Fit the model (which I don't see in your code):
import statsmodels.formula.api as smf
model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)
res = model_logit.fit()
You can get the in sample predictions (in probabilities) and the predicted label :
in_sample = pd.DataFrame({'prob':res.predict()})
in_sample['pred_label'] = (in_sample['prob']>0.5).astype(int)
in_sample.head()
prob pred_label
0 0.005401 0
1 0.911056 1
2 0.990406 1
3 0.412332 0
4 0.983642 1
And we check against the actual label :
pd.crosstab(in_sample['pred_label'],model_data['dep'])
dep 0 1
pred_label
0 46 2
1 4 48
QUESTION
How to make Jupyter notebook python help function output colorful?
Asked 2021-Sep-23 at 09:24I am new to Jupyter notebook and trying to see the some help about the functions. For example, when I print the help of statsmodels.OLS
I got the following plain black and white help.
Are there any python modules that colorize/beautify the help
outputs?
For example:
If there are not some modules, what would be the starting point, to colorize the parameters and the python codes?
The example output of help
is given below:
ANSWER
Answered 2021-Sep-23 at 09:24You can try to beautify the help using rich library (in jupyter, you can install it, using the command !pip install rich
).
In particular, you could study the inspect method.
For example, with the following code:
from rich import inspect
inspect(sm.OLS, help=True)
QUESTION
RandomizedSearchCV: All estimators failed to fit
Asked 2021-Sep-07 at 19:33I am currently working on the "French Motor Claims Datasets freMTPL2freq" Kaggle competition (https://www.kaggle.com/floser/french-motor-claims-datasets-fremtpl2freq). Unfortunately I get a "NotFittedError: All estimators failed to fit" error whenever I am using RandomizedSearchCV and I cannot figure out why that is. Any help is much appreciated.
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
from matplotlib import pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_poisson_deviance
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.metrics import mean_gamma_deviance
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
data_freq = pd.read_csv('freMTPL2freq.csv')
data_freq['Area'] = data_freq['Area'].str.replace('\'','')
data_freq['VehBrand'] = data_freq['VehBrand'].str.replace('\'','')
data_freq['VehGas'] = data_freq['VehGas'].str.replace('\'','')
data_freq['Region'] = data_freq['Region'].str.replace('\'','')
data_freq['frequency'] = data_freq['ClaimNb'] / data_freq['Exposure']
y = data_freq['frequency']
X = data_freq.drop(['frequency', 'ClaimNb', 'IDpol'], axis = 1)
X_train, X_val, y_train, y_val = train_test_split(X,y, test_size=0.2, shuffle = True, random_state = 42)
pt_columns = ['VehPower', 'VehAge', 'DrivAge', 'BonusMalus', 'Density']
cat_columns = ['Area', 'Region', 'VehBrand', 'VehGas']
from xgboost import XGBRegressor
ct = ColumnTransformer([('pt', 'passthrough', pt_columns),
('ohe', OneHotEncoder(), cat_columns)])
pipe_xgbr = Pipeline([('cf_trans', ct),
('ssc', StandardScaler(with_mean = False)),
('xgb_regressor', XGBRegressor())
])
param = {'xgb_regressor__n_estimators':[3, 5],
'xgb_regressor__max_depth':[3, 5, 7],
'xgb_regressor__learning_rate':[0.1, 0.5],
'xgb_regressor__colsample_bytree':[0.5, 0.8],
'xgb_regressor__subsample':[0.5, 0.8]
}
rscv = RandomizedSearchCV(pipe_xgbr, param_distributions = param, n_iter = 2, scoring = mean_squared_error, n_jobs = -1, cv = 5, error_score = 'raise')
rscv.fit(X_train, y_train, xgbr_regressor__sample_weight = X_train['Exposure'])
The first five rows of the original dataframe data_freq look like this:
IDpol ClaimNb Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas Density Region
0 1.0 1 0.10 D 5 0 55 50 B12 Regular 1217 R82
1 3.0 1 0.77 D 5 0 55 50 B12 Regular 1217 R82
2 5.0 1 0.75 B 6 2 52 50 B12 Diesel 54 R22
3 10.0 1 0.09 B 7 0 46 50 B12 Diesel 76 R72
4 11.0 1 0.84 B 7 0 46 50 B12 Diesel 76 R72
The error I get is as follows:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker
r = call_item()
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 608, in __call__
return self.func(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\pipeline.py", line 340, in fit
fit_params_steps = self._check_fit_params(**fit_params)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\pipeline.py", line 261, in _check_fit_params
fit_params_steps[step][param] = pval
KeyError: 'xgbr_regressor'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-68-0c1886d1e985> in <module>
----> 1 rscv.fit(X_train, y_train, xgbr_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
560 AsyncResults.get from multiprocessing."""
561 try:
--> 562 return future.result(timeout=timeout)
563 except LokyTimeoutError:
564 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
KeyError: 'xgbr_regressor'
I also tried running fit without the sample_weight parameter. In this case the error changes to:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker
r = call_item()
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 608, in __call__
return self.func(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 625, in _fit_and_score
test_scores = _score(estimator, X_test, y_test, scorer, error_score)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
scores = scorer(estimator, X_test, y_test)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 74, in inner_f
return f(**kwargs)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\metrics\_regression.py", line 336, in mean_squared_error
y_true, y_pred, multioutput)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\metrics\_regression.py", line 88, in _check_reg_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 316, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 316, in <listcomp>
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 249, in _num_samples
raise TypeError(message)
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-69-a9be9cc5df4a> in <module>
----> 1 rscv.fit(X_train, y_train)#, xgbr_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
560 AsyncResults.get from multiprocessing."""
561 try:
--> 562 return future.result(timeout=timeout)
563 except LokyTimeoutError:
564 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
When setting verbose = 10 and n_jobs = 1 the following error message shows up:
Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV 1/5; 1/2] START xgb_regressor__colsample_bytree=0.5, xgb_regressor__learning_rate=0.5, xgb_regressor__max_depth=5, xgb_regressor__n_estimators=5, xgb_regressor__subsample=0.5
C:\Users\Jan\anaconda3\lib\site-packages\sklearn\utils\validation.py:72: FutureWarning: Pass sample_weight=406477 1.0
393150 0.0
252885 0.0
260652 0.0
661256 0.0
...
154663 0.0
398414 0.0
42890 0.0
640774 0.0
114446 0.0
Name: frequency, Length: 108482, dtype: float64 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
"will result in an error", FutureWarning)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-74435f74c470> in <module>
----> 1 rscv.fit(X_train, y_train, xgb_regressor__sample_weight = X_train['Exposure'])
2 #pipe_xgbr.fit(X_train, y_train)
3 #X_train.describe(include = 'all')
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1633 evaluate_candidates(ParameterSampler(
1634 self.param_distributions, self.n_iter,
-> 1635 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
807 (split_idx, (train, test)) in product(
808 enumerate(candidate_params),
--> 809 enumerate(cv.split(X, y, groups))))
810
811 if len(out) < 1:
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1002 # remaining jobs.
1003 self._iterating = False
-> 1004 if self.dispatch_one_batch(iterator):
1005 self._iterating = self._original_iterator is not None
1006
~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
833 return False
834 else:
--> 835 self._dispatch(tasks)
836 return True
837
~\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
752 with self._lock:
753 job_idx = len(self._jobs)
--> 754 job = self._backend.apply_async(batch, callback=cb)
755 # A job can complete so quickly than its callback is
756 # called before we get here, causing self._jobs to
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
207 def apply_async(self, func, callback=None):
208 """Schedule a func to be run"""
--> 209 result = ImmediateResult(func)
210 if callback:
211 callback(result)
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
588 # Don't delay the application, to avoid keeping the input
589 # arguments in memory
--> 590 self.results = batch()
591
592 def get(self):
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~\anaconda3\lib\site-packages\sklearn\utils\fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
623
624 fit_time = time.time() - start_time
--> 625 test_scores = _score(estimator, X_test, y_test, scorer, error_score)
626 score_time = time.time() - start_time - fit_time
627 if return_train_score:
~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _score(estimator, X_test, y_test, scorer, error_score)
685 scores = scorer(estimator, X_test)
686 else:
--> 687 scores = scorer(estimator, X_test, y_test)
688 except Exception:
689 if error_score == 'raise':
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
72 "will result in an error", FutureWarning)
73 kwargs.update(zip(sig.parameters, args))
---> 74 return f(**kwargs)
75 return inner_f
76
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
334 """
335 y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 336 y_true, y_pred, multioutput)
337 check_consistent_length(y_true, y_pred, sample_weight)
338 output_errors = np.average((y_true - y_pred) ** 2, axis=0,
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
86 the dtype argument passed to check_array.
87 """
---> 88 check_consistent_length(y_true, y_pred)
89 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
90 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
314 """
315
--> 316 lengths = [_num_samples(X) for X in arrays if X is not None]
317 uniques = np.unique(lengths)
318 if len(uniques) > 1:
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in <listcomp>(.0)
314 """
315
--> 316 lengths = [_num_samples(X) for X in arrays if X is not None]
317 uniques = np.unique(lengths)
318 if len(uniques) > 1:
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _num_samples(x)
247 if hasattr(x, 'fit') and callable(x.fit):
248 # Don't get num_samples from an ensembles length!
--> 249 raise TypeError(message)
250
251 if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
ANSWER
Answered 2021-Sep-06 at 14:32According to your error message, KeyError: 'xgbr_regressor'
the code cant find the key xgbr_regressor
in your Pipeline. In your pipeline, you have defined the xgb_regressor
:
pipe_xgbr = Pipeline(
[('cf_trans', ct),
('ssc', StandardScaler(with_mean = False)),
('xgb_regressor', XGBRegressor())])
But when you try to fit, you call it with a reference to xgbr_regressor
which is why the KeyError is thrown:
rscv.fit(X_train, y_train, xgbr_regressor__sample_weight=X_train['Exposure'])
Therefore, you must change the above line to swap out xgbr_regressor__sample_weight
to xgb_regressor__sample_weight
and this should eliminate that error.
QUESTION
OLS fit for python with coefficient error and transformed target
Asked 2021-Aug-12 at 22:23There seems to be two methods for OLS fits in python. The Sklearn one and the Statsmodel one. I have a preference for the statsmodel one because it gives the error on the coefficients via the summary() function. However, I would like to use the TransformedTargetRegressor from sklearn to log my target. It would seem that I need to choose between getting the error on my fit coefficients in statsmodel and being able to transform my target in statsmodel. Is there a good way to do both of these at the same time in either system?
In stats model it would be done like this
import statsmodels.api as sm
X = sm.add_constant(X)
ols = sm.OLS(y, X)
ols_result = ols.fit()
print(ols_result.summary())
To return the fit with the coefficients and the error on them
For Sklearn you can use the TransformedTargetRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression
regr = TransformedTargetRegressor(regressor=LinearRegression(),func=np.log1p, inverse_func=np.expm1)
regr.fit(X, y)
print('Coefficients: \n', regr.coef_)
But there is no way to get the error on the coefficients without calculating them yourself. Is there a good way to get the best of both worlds?
EDIT
I found a good example for the special case I care about here
ANSWER
Answered 2021-Aug-06 at 09:15In short, Scikit learn cannot help you in calculating coefficient standard errors. However, if you opt to use it, you can just calculate the errors by yourself. In the question Python scikit learn Linear Model Parameter Standard Error @grisaitis provided a great answer explaining the main concepts behind it.
If you only want to use a plug-and-play function that will work with sciait-learn you can use this:
def get_coef_std_errors(reg: 'sklearn.linear_model.LinearRegression',
y_true: 'np.ndarray', X: 'np.ndarray'):
"""Function that calculates the standard deviation of the coefficients of
a linear regression.
Parameters
----------
reg : sklearn.linear_model.LinearRegression
LinearRegression object which has been fitted
y_true : np.ndarray
array containing the target variable
X : np.ndarray
array containing the features used in the regression
Returns
-------
beta_std
Standard deviation of the regression coefficients
"""
y_pred = reg.predict(X) # get predictions
errors = y_true - y_pred # calculate residuals
sigma_sq_hat = np.var(errors) # calculate residuals std error
sigma_beta_hat = sigma_sq_hat * np.linalg.inv(X.T @ X)
return np.sqrt(np.diagonal(sigma_beta_hat)) # diagonal to recover variances
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
PyPI
pip install statsmodels
HTTPS
https://github.com/statsmodels/statsmodels.git
CLI
gh repo clone statsmodels/statsmodels
SSH
git@github.com:statsmodels/statsmodels.git
Share this Page
See Similar Libraries in
See all related Kits
by statsmodels Python
by statsmodels Shell
by statsmodels Python
See all Libraries by this author
by apache
by statsmodels
by cube-js
by dropwizard
by micrometer-metrics
See all Analytics Libraries
by tim-group
by StreakYC
by dimovelev
by uwescience
by amplitude
See all Analytics Libraries
by palominolabs
by ryantenney
by tim-group
by organicveggie
by librato
See all Analytics Libraries
by elastic
by ryantenney
by organicveggie
by librato
by uwescience
See all Analytics Libraries
by astefanutti
by slhck
by tlentali
by allenai
by mbiamont
See all Analytics Libraries
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source