scikit-learn | scikit-learn : machine learning in Python | Machine Learning library

 by   scikit-learn Python Version: 1.2.2 License: BSD-3-Clause

kandi X-RAY | scikit-learn Summary

kandi X-RAY | scikit-learn Summary

scikit-learn is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Machine Learning, Pandas applications. scikit-learn has no bugs, it has build file available, it has a Permissive License and it has high support. However scikit-learn has 1 vulnerabilities. You can install using 'pip install scikit-learn' or download it from GitHub, PyPI.
scikit-learn: machine learning in Python

            kandi-support Support

              scikit-learn has a highly active ecosystem.
              It has 54382 star(s) with 24306 fork(s). There are 2154 watchers for this library.
              There were 3 major release(s) in the last 6 months.
              There are 1580 open issues and 8567 have been closed. On average issues are closed in 230 days. There are 615 open pull requests and 0 closed requests.
              It has a positive sentiment in the developer community.
              The latest version of scikit-learn is 1.2.2

            kandi-Quality Quality

              scikit-learn has 0 bugs and 0 code smells.

            kandi-Security Security

              scikit-learn has 1 vulnerability issues reported (1 critical, 0 high, 0 medium, 0 low).
              scikit-learn code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              scikit-learn is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              scikit-learn releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              scikit-learn saves you 147220 person hours of effort in developing the same functionality from scratch.
              It has 176758 lines of code, 9057 functions and 897 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed scikit-learn and discovered the below as its top functions. This is intended to give you an instant insight into scikit-learn implemented functionality, and help decide if they suit your requirements.
            • Linear Grammarization problem .
            • Linear path solver .
            • Logistic regression .
            • Compute a dictionary of learning statistics for a given dataset .
            • Local embedding .
            • Plot the image .
            • Plot the partial dependence of the estimator .
            • Check if an array is valid .
            • r Enet Path Method .
            • Calculate partial dependence of the estimator .
            Get all kandi verified functions for this library.

            scikit-learn Key Features

            scikit-learn: machine learning in Python

            scikit-learn Examples and Code Snippets

            Text classification with Scikit-Learn-Create a model script
            Pythondot img1Lines of Code : 121dot img1License : Permissive (Apache-2.0)
            copy iconCopy
            import pickle
            import os
            import numpy as np
            from sklearn.linear_model import LogisticRegression
            from sklearn.feature_extraction.text import TfidfVectorizer
            from sklearn.pipeline import make_pipeline
            from label_studio_ml.model import LabelStudioMLBas  
            copy iconCopy
            import xgboost
            import shap
            # train an XGBoost model
            X, y =
            model = xgboost.XGBRegressor().fit(X, y)
            # explain the model's predictions using SHAP
            # (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark,   
            Tree ensemble example (XGBoost/LightGBM/CatBoost/scikit-learn/pyspark models)
            Jupyter Notebookdot img3Lines of Code : 23dot img3License : Permissive (MIT)
            copy iconCopy
            import xgboost
            import shap
            # train an XGBoost model
            X, y =
            model = xgboost.XGBRegressor().fit(X, y)
            # explain the model's predictions using SHAP
            # (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark,   
            Returns a scikit - learn scalar .
            pythondot img4Lines of Code : 15dot img4no licencesLicense : No License
            copy iconCopy
            def get_scaler(env):
              # return scikit-learn scaler object to scale the states
              # Note: you could also populate the replay buffer here
              states = []
              for _ in range(env.n_step):
                action = np.random.choice(env.action_space)
                state, reward, do  
            How to choose LinearSVC instead of SVC if kernel=linear in param_grid?
            Pythondot img5Lines of Code : 17dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            from sklearn.dummy import DummyClassifier
            from sklearn.model_selection import GridSearchCV
            from sklearn.pipeline import Pipeline
            search_spaces = [
                {'svm': [SVC(kernel='rbf')],
                 'svm__gamma': ('scale', 'auto'),
                 'svm__C': (0.1,
            for loop to print logistic regression stats summary | statsmodels
            Pythondot img6Lines of Code : 11dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            def opportunites():
                indep = ['AGE', 'S0287', 'T0080', 'SALARY', 'T0329', 'T0333', 'T0159', 'T0165', 'EXPER', 'T0356']
                for i in indep:
                    model = smf.logit(f'LEAVER ~ {i} ', data = df).fit()
            Adding text labels to a plotly scatter plot for a subset of points
            Pythondot img7Lines of Code : 33dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import pandas as pd
            from sklearn import linear_model
            import as px
            df =
            ## use linear model to determine outliers by residual
            X = df["total_bill"].values.reshape(-1, 1)
            y = df["tip"].values
            regr = linear_mod
            Get boundary coordinates for clusters created from sklearn Gaussian Mixture
            Pythondot img8Lines of Code : 3dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            X = df_1[['weights', 'percentiles']].to_numpy()
            prediction = gm_model.predict(X)
            How to get the ROC curve of a neural network?
            Pythondot img9Lines of Code : 20dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            _, pred = torch.max(output, dim=1)
            probabilities = output[:, 1]
            import torch.nn.functional as F
            probabilities = F.softmax(output, dim=1)[:, 1]
            y_score = probabilit
            Python `import A.B` does not work but `from A import B` works
            Pythondot img10Lines of Code : 10dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            >>> import sklearn
            >>> sklearn.preprocessing.normalize
            >>> import sklearn.preprocessing
            >>> sklearn.preprocessing.normalize()
            from sklearn.preprocessing 

            Community Discussions


            Installing scipy and scikit-learn on apple m1
            Asked 2022-Mar-22 at 06:21

            The installation on the m1 chip for the following packages: Numpy 1.21.1, pandas 1.3.0, torch 1.9.0 and a few other ones works fine for me. They also seem to work properly while testing them. However when I try to install scipy or scikit-learn via pip this error appears:

            ERROR: Failed building wheel for numpy

            Failed to build numpy

            ERROR: Could not build wheels for numpy which use PEP 517 and cannot be installed directly

            Why should Numpy be build again when I have the latest version from pip already installed?

            Every previous installation was done using python3.9 -m pip install ... on Mac OS 11.3.1 with the apple m1 chip.

            Maybe somebody knows how to deal with this error or if its just a matter of time.



            Answered 2021-Aug-02 at 14:33

            Please see this note of scikit-learn about

            Installing on Apple Silicon M1 hardware

            The recently introduced macos/arm64 platform (sometimes also known as macos/aarch64) requires the open source community to upgrade the build configuation and automation to properly support it.

            At the time of writing (January 2021), the only way to get a working installation of scikit-learn on this hardware is to install scikit-learn and its dependencies from the conda-forge distribution, for instance using the miniforge installers:


            The following issue tracks progress on making it possible to install scikit-learn from PyPI with pip:




            negative values for mean squared errors in sae package for R
            Asked 2022-Feb-25 at 14:28

            I have been using "sae" package for R to use small area estimations with spatial fay-herriot models (SFH). Using different distance matrices I occasionally obtained negative values of Mean Squared Errors (MSE).

            The following link may reference a similar behavior:

            scikit-learn cross validation, negative values with mean squared error

            In any case here is a working example:



            Answered 2022-Feb-25 at 14:28

            I'm pretty sure that this is due to bias correction that generally takes place when you have MSE. You can read about the formula for bias correction that is used in the references they provided in ?sae::meanSFH. In one of the articles, they provided a case study where the average MSE is negative. (I found this in Molina et al., 2009. They identify the bias correction in a few places, but it's very clear on pp. 452-453.)

            You can visualize the errors and see how very close they are to zero.



            Colab: (0) UNIMPLEMENTED: DNN library is not found
            Asked 2022-Feb-08 at 19:27

            I have pretrained model for object detection (Google Colab + TensorFlow) inside Google Colab and I run it two-three times per week for new images I have and everything was fine for the last year till this week. Now when I try to run model I have this message:



            Answered 2022-Feb-07 at 09:19

            It happened the same to me last friday. I think it has something to do with Cuda instalation in Google Colab but I don't know exactly the reason



            How to install local package with conda
            Asked 2022-Feb-05 at 04:16

            I have a local python project called jive that I would like to use in an another project. My current method of using jive in other projects is to activate the conda env for the project, then move to my jive directory and use python install. This works fine, and when I use conda list, I see everything installed in the env including jive, with a note that jive was installed using pip.

            But what I really want is to do this with full conda. When I want to use jive in another project, I want to just put jive in that projects environment.yml.

            So I did the following:

            1. write a simple meta.yaml so I could use conda-build to build jive locally
            2. build jive with conda build .
            3. I looked at the tarball that was produced and it does indeed contain the jive source as expected
            4. In my other project, add jive to the dependencies in environment.yml, and add 'local' to the list of channels.
            5. create a conda env using that environment.yml.

            When I activate the environment and use conda list, it lists all the dependencies including jive, as desired. But when I open python interpreter, I cannot import jive, it says there is no such package. (If use python install, I can import it.) How can I fix the build/install so that this works?

            Here is the meta.yaml, which lives in the jive project top level directory:



            Answered 2022-Feb-05 at 04:16

            The immediate error is that the build is generating a Python 3.10 version, but when testing Conda doesn't recognize any constraint on the Python version, and creates a Python 3.9 environment.

            I think the main issue is that python >=3.5 is only a valid constraint when doing noarch builds, which this is not. That is, once a package builds with a given Python version, the version must be constrained to exactly that version (up through minor). So, in this case, the package is built with Python 3.10, but it reports in its metadata that it is compatible with all versions of Python 3.5+, which simply isn't true because Conda Python packages install the modules into Python-version-specific site-packages (e.g., lib/python-3.10/site-packages/jive).

            Typically, Python versions are controlled by either the --python argument given to conda-build or a matrix supplied by the conda_build_config.yaml file (see documentation on "Build variants").

            Try adjusting the meta.yaml to something like



            Cannot find conda info. Please verify your conda installation on EMR
            Asked 2022-Feb-05 at 00:17

            I am trying to install conda on EMR and below is my bootstrap script, it looks like conda is getting installed but it is not getting added to environment variable. When I manually update the $PATH variable on EMR master node, it can identify conda. I want to use conda on Zeppelin.

            I also tried adding condig into configuration like below while launching my EMR instance however I still get the below mentioned error.



            Answered 2022-Feb-05 at 00:17

            I got the conda working by modifying the script as below, emr python versions were colliding with the conda version.:



            Updating Python sklearn Lasso(normalize=True) to Use Pipeline
            Asked 2021-Dec-28 at 10:34

            I am new to Python. I am trying to practice basic regularization by following along with a DataCamp exercise using this CSV:



            Answered 2021-Nov-24 at 09:45

            When you set Lasso(..normalize=True) the normalization is different from that in StandardScaler(). It divides by the l2-norm instead of the standard deviation. If you read the help page:

            normalize bool, default=False This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

            Deprecated since version 1.0: normalize was deprecated in version 1.0 and will be removed in 1.2.

            It is also touched upon in this post. Since it will be deprecated, I think it's better to just use the StandardScaler normalization. You can see it's reproducible as long as you scale it in the same way:



            Can't deploy streamlit app on
            Asked 2021-Dec-25 at 14:42

            I am working with a simple ML model with streamlit. It runs fine on my local machine inside conda environment, but it shows Error installing requirements when I try to deploy it on
            The error message is the following:



            Answered 2021-Dec-25 at 14:42

            Streamlit share runs the app in a linux environment meaning there is no pywin32 because this is for windows.

            Delete the pywin32 from the requirements file and also the pywinpty==1.1.6 for the same reason.

            After deleting these requirements re-deploy your app and it will work.



            Sklearn: Calibrate a multi-label classification with CalibratedClassifierCV
            Asked 2021-Dec-18 at 17:38

            I have built a number of sklearn classifier models to perform multi-label classification and I would like to calibrate their predict_proba outputs so that I can obtain confidence scores. I would also like to use metrics such as sklearn.metrics.recall_score to evaluate them.

            I have 4 labels to predict and the true labels are multi-hot encoded (e.g. [0, 1, 1, 1]). As a result, CalibratedClassifierCV does not directly accept my data:



            Answered 2021-Dec-17 at 15:33

            In your example, you're using a DecisionTreeClassifier which by default support targets of dimension (n, m) where m > 1.

            However if you want to have as result the marginal probability of each class then use the OneVsRestClassifier.

            Notice that CalibratedClassifierCV expects target to be 1d so the "trick" is to extend it to support Multilabel Classification with MultiOutputClassifier.

            Full Example



            understanding sklearn calibratedClassifierCV
            Asked 2021-Dec-03 at 13:03

            Hi all I am having trouble understanding how to use the output of sklearn.calibration.CalibratedClassifierCV.

            I have calibrated my binary classifier using this method, and results are greatly improved. However I am not sure how to interpret the results. sklearn guide states that, after calibration,

            the output of predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

            Now I would like to reduce false positive by applying a cutoff at .6 for the model to predict label True. Without the calibration, I would have simply used my_model.predict_proba() > .6. However, it seems that after calibration the meaning of predict_proba has changed, so I am not sure if I can do that anymore.

            From a quick testing it seems that predict and predict_proba follow the same logic I would expect before calibration. The output of:



            Answered 2021-Dec-03 at 13:03

            For me, you can actually use predict_proba() after calibration to apply a different cutoff.

            What happens within class CalibratedClassifierCV (as you noticed) is effectively that the output of predict() is based on the output of predict_proba() (see here for reference), i.e. np.argmax(self.predict_proba(X), axis=1) == self.predict(X).

            On the other side, for the non-calibrated classifier that you're passing to CalibratedClassifierCV (depending on whether it is a probabilistic classifier or not) the above equality may or may not hold (e.g. it does not for an SVC() classifier - see here, for instance, for some other details on this).



            Meaning of `penalty` and `loss` in LinearSVC
            Asked 2021-Nov-18 at 18:08

            Anti-closing preamble: I have read the question "difference between penalty and loss parameters in Sklearn LinearSVC library" but I find the answer there not to be specific enough. Therefore, I’m reformulating the question:

            I am familiar with SVM theory and I’m experimenting with LinearSVC class in Python. However, the documentation is not quite clear regarding the meaning of penalty and loss parameters. I recon that loss refers to the penalty for points violating the margin (usually denoted by the Greek letter xi or zeta in the objective function), while penalty is the norm of the vector determining the class boundary, usually denoted by w. Can anyone confirm or deny this?

            If my guess is right, then penalty = 'l1' would lead to minimisation of the L1-norm of the vector w, like in LASSO regression. How does this relate to the maximum-margin idea of the SVM? Can anyone point me to a publication regarding this question? In the original paper describing LIBLINEAR I could not find any reference to L1 penalty.

            Also, if my guess is right, why doesn't LinearSVC support the combination of penalty='l2' and loss='hinge' (the standard combination in SVC) when dual=False? When trying it, I get the

            ValueError: Unsupported set of arguments



            Answered 2021-Nov-18 at 18:08

            Though very late, I'll try to give my answer. According to the doc, here's the considered primal optimization problem for LinearSVC: ,phi being the Identity matrix, given that LinearSVC only solves linear problems.

            Effectively, this is just one of the possible problems that LinearSVC admits (it is the L2-regularized, L1-loss in the terms of the LIBLINEAR paper) and not the default one (which is the L2-regularized, L2-loss). The LIBLINEAR paper gives a more general formulation for what concerns what's referred to as loss in Chapter 2, then it further elaborates also on what's referred to as penalty within the Appendix (A2+A4).

            Basically, it states that LIBLINEAR is meant to solve the following unconstrained optimization pb with different loss functions xi(w;x,y) (which are hinge and squared_hinge); the default setting of the model in LIBLINEAR does not consider the bias term, that's why you won't see any reference to b from now on (there are many posts on SO on this).

            • , hinge or L1-loss
            • , squared_hinge or L2-loss.

            For what concerns the penalty, basically this represents the norm of the vector w used. The appendix elaborates on the different problems:

            • L2-regularized, L1-loss (penalty='l2', loss='hinge'):
            • L2-regularized, L2-loss (penalty='l2', loss='squared_hinge'), default in LinearSVC:
            • L1-regularized, L2-loss (penalty='l1', loss='squared_hinge'):

            Instead, as stated within the documentation, LinearSVC does not support the combination of penalty='l1' and loss='hinge'. As far as I see the paper does not specify why, but I found a possible answer here (within the answer by Arun Iyer).

            Eventually, effectively the combination of penalty='l2', loss='hinge', dual=False is not supported as specified in here (it is just not implemented in LIBLINEAR) or here; not sure whether that's the case, but within the LIBLINEAR paper from Appendix B onwards it is specified the optimization pb that's solved (which in the case of L2-regularized, L1-loss seems to be the dual).

            For a theoretical discussion on SVC pbs in general, I found that chapter really useful; it shows how the minimization of the norm of w relates to the idea of the maximum-margin.


            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install scikit-learn

            You can install using 'pip install scikit-learn' or download it from GitHub, PyPI.
            You can use scikit-learn like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.


            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
          • PyPI

            pip install scikit-learn

          • CLONE
          • HTTPS


          • CLI

            gh repo clone scikit-learn/scikit-learn

          • sshUrl


          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link