imbalanced-learn | A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

 by   scikit-learn-contrib Python Version: 0.11.0 License: MIT

kandi X-RAY | imbalanced-learn Summary

kandi X-RAY | imbalanced-learn Summary

imbalanced-learn is a Python library typically used in Institutions, Learning, Education, Data Science, Pandas applications. imbalanced-learn has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can install using 'pip install imbalanced-learn' or download it from GitHub, PyPI.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

            kandi-support Support

              imbalanced-learn has a medium active ecosystem.
              It has 6346 star(s) with 1236 fork(s). There are 144 watchers for this library.
              There were 1 major release(s) in the last 6 months.
              There are 46 open issues and 497 have been closed. On average issues are closed in 109 days. There are 19 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of imbalanced-learn is 0.11.0

            kandi-Quality Quality

              imbalanced-learn has 0 bugs and 0 code smells.

            kandi-Security Security

              imbalanced-learn has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              imbalanced-learn code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              imbalanced-learn is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              imbalanced-learn releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              imbalanced-learn saves you 6003 person hours of effort in developing the same functionality from scratch.
              It has 13560 lines of code, 626 functions and 139 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed imbalanced-learn and discovered the below as its top functions. This is intended to give you an instant insight into imbalanced-learn implemented functionality, and help decide if they suit your requirements.
            • Create a classification report
            • R Sensitivity support
            • Decorator to make an index balanced scoring
            • Calculate geometric mean squared error
            • Generate an imbalanced dataset
            • Construct a sampling strategy dictionary
            • Check the sampling strategy
            • Calculate sampling strategy
            • Check if the given classifier has the correct classification
            • Simulate an imbalanced dataset
            • Plot scatter plot
            • Default sampling strategy
            • Return the sampling strategy for the given sampling type
            • Fits a balanced Batch model
            • Make a marginal plot
            • Plot the decision function
            • Returns the sampling strategy for each class
            • Return a function to resolve linkcode
            • Checks the sampler to fit the sampler
            • Create a classification dataset
            • Returns the sampling strategy for the given sampling type
            • Import keras module
            • Parametrize estimator
            • Checks if noise is in danger noise
            • Decorator to make a scoring function
            • Calculate geometric mean squared error score
            • Fetch datasets from Zenodo
            Get all kandi verified functions for this library.

            imbalanced-learn Key Features

            No Key Features are available at this moment for imbalanced-learn.

            imbalanced-learn Examples and Code Snippets

            Usage,EvoPreprocess as a part of the pipeline (from imbalanced-learn)
            Pythondot img1Lines of Code : 37dot img1License : Strong Copyleft (GPL-3.0)
            copy iconCopy
            from imblearn.pipeline import Pipeline
            from sklearn.datasets import load_breast_cancer
            from sklearn.metrics import accuracy_score
            from sklearn.model_selection import train_test_split
            from sklearn.tree import DecisionTreeClassifier
            from evopreprocess.  
            imbalanced-learn - plot comparison over sampling
            Pythondot img2Lines of Code : 172dot img2License : Permissive (MIT License)
            copy iconCopy
            Compare over-sampling samplers
            The following example attends to make a qualitative comparison between the
            different over-sampling algorithms available in the imbalanced-learn package.  
            imbalanced-learn - plot comparison under sampling
            Pythondot img3Lines of Code : 166dot img3License : Permissive (MIT License)
            copy iconCopy
            Compare under-sampling samplers
            The following example attends to make a qualitative comparison between the
            different under-sampling algorithms available in the imbalanced-learn pack  
            imbalanced-learn - plot impact imbalanced classes
            Pythondot img4Lines of Code : 146dot img4License : Permissive (MIT License)
            copy iconCopy
            Fitting model on imbalanced datasets and how to fight bias
            This example illustrates the problem induced by learning on datasets  

            Community Discussions


            Get error: unexpected keyword argument 'random_state' when using TomekLinks
            Asked 2022-Mar-24 at 14:44

            My code is:



            Answered 2022-Mar-24 at 14:44

            I think, you're looking at the wrong documentation. That one is for version 0.3.0-dev, so I checked: -- this parameter has been deprecated in a newer version 0.9.0.

            Also as the documentation goes, seems you have to specify it in make_classification function as below:



            feature importance bagging classifier and column names
            Asked 2022-Mar-19 at 12:08

            I already referred these two posts:

            Please don't mark this as a duplicate.

            I am trying to get the feature names from a bagging classifier (which does not have inbuilt feature importance).

            I have the below sample data and code based on those related posts linked above



            Answered 2022-Mar-19 at 12:08

            You could call the load_iris function without any parameters, this way the return of the function will be a Bunch object (dictionary-like object) with some attributes. The most relevant, for your use case, would be (feature matrix), and bunch.feature_names.



            TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']
            Asked 2022-Feb-20 at 14:24

            I already referred the posts here, here and here. Don't mark it as duplicate.

            I am working on a binary classification problem where my dataset has categorical and numerical columns.

            However, some of the categorical columns has a mix of numeric and string values. Nontheless, they only indicate the category name.

            For instance, I have a column called biz_category which has values like A,B,C,4,5 etc.

            I guess the below error is thrown due to values like 4 and 5.

            Therefore, I tried the belowm to convert them into category datatype. (but still it doesn't work)



            Answered 2022-Feb-20 at 14:22
            Cause of the problem

            SMOTE requires the values in each categorical/numerical column to have uniform datatype. Essentially you can not have mixed datatypes in any of the column in this case your biz_category column. Also merely casting the column to categorical type does not necessarily mean that the values in that column will have uniform datatype.

            Possible solution

            One possible solution to this problem is to re-encode the values in those columns which have mixed data types for example you could use lableencoder but I think in your case simply changing the dtype to string would also work.



            Cant install imbalanced-learn on an Azure ML Environment
            Asked 2022-Feb-15 at 21:26

            I have an Azure ML Workspace which comes by default with some pre-installed packages.

            I tried to install



            Answered 2022-Feb-15 at 14:23

            scikit-learn 1.0.1 and up require Python >= 3.7; you use Python 3.6. You need to upgrade Python or downgrade imbalanced-learn. imbalanced-learn 0.8.1 allows Python 3.6 so



            Download UNIX python wheel in windows
            Asked 2022-Feb-09 at 12:20

            I want to download a UNIX python wheel into my windows PC, for later install it in a UNIX server with no access to the internet

            I tried



            Answered 2022-Feb-09 at 12:20


            Cannot find conda info. Please verify your conda installation on EMR
            Asked 2022-Feb-05 at 00:17

            I am trying to install conda on EMR and below is my bootstrap script, it looks like conda is getting installed but it is not getting added to environment variable. When I manually update the $PATH variable on EMR master node, it can identify conda. I want to use conda on Zeppelin.

            I also tried adding condig into configuration like below while launching my EMR instance however I still get the below mentioned error.



            Answered 2022-Feb-05 at 00:17

            I got the conda working by modifying the script as below, emr python versions were colliding with the conda version.:



            A problem in using AIF360 metrics in my code
            Asked 2022-Jan-29 at 15:28

            I am trying to run AI Fairness 360 metrics on skit-learn (imbalanced-learn) algorithms, but I have a problem with my code. The problem is when I apply skit-learn (imbalanced-learn) algorithms like SMOTE, it return a numpy array. While AI Fairness 360 preprocessing methods return BinaryLabelDataset. Then the metrics should receive an object from BinaryLabelDataset class. I am stuck in how to convert my arrays to BinaryLabelDataset to be able to use measures.

            My preprocessing algorithm needs to receive X,Y. So, I split the dataset before calling SMOTE method into X and Y. The dataset before using SMOTE was standard_dataset and it was ok to use metrics, but the problem after I used SMOTE method because it converts data to numpy array.

            I got the following error after running the code :



            Answered 2021-Sep-21 at 17:34

            You are correct that the problem is with y_pred. You can concatenate it to X_test, transform it to a StandardDataset object, and then pass that one to the BinaryLabelDatasetMetric. The output object will have the methods for calculating different fairness metrics. I do not know how your dataset looks like, but here is a complete reproducible example that you can adapt to do this process for your dataset.



            Solving conda environment stuck
            Asked 2021-Dec-22 at 18:02

            I'm trying to install conda environment using the command:



            Answered 2021-Dec-22 at 18:02

            This solves fine (), but is indeed a complex solve mainly due to:

            • underspecification
            • lack of modularization

            This particular environment specification ends up installing well over 300 packages. And there isn't a single one of those that are constrained by the specification. That is a huge SAT problem to solve and Conda will struggle with this. Mamba will help solve faster, but providing additional constraints can vastly reduce the solution space.

            At minimum, specify a Python version (major.minor), such as python=3.9. This is the single most effective constraint.

            Beyond that, putting minimum requirements on central packages (those that are dependencies of others) can help, such as minimum NumPy.

            Lack of Modularization

            I assume the name "devenv" means this is a development environment. So, I get that one wants all these tools immediately at hand. However, Conda environment activation is so simple, and most IDE tooling these days (Spyder, VSCode, Jupyter) encourages separation of infrastructure and the execution kernel. Being more thoughtful about how environments (emphasis on the plural) are organized and work together, can go a long way in having a sustainable and painless data science workflow.

            The environment at hand has multiple red flags in my book:

            • conda-build should be in base and only in base
            • snakemake should be in a dedicated environment
            • notebook (i.e., Jupyter) should be in a dedicated environment, co-installed with nb_conda_kernels; all kernel environments need are ipykernel

            I'd probably also have the linting/formatting packages separated, but that's less an issue. The real killer though is snakemake - it's just a massive piece of infrastructure and I'd strongly encourage keeping that separated.



            Multipoint(df['geometry']) key error from dataframe but key exist. KeyError: 13 geopandas
            Asked 2021-Oct-11 at 14:51

            data source:

            I tried looking for a similar problem but I can't find an answer and the error does not help much. I'm kinda frustrated at this point. Thanks for the help. I'm calculating the closest distance from a point.



            Answered 2021-Oct-11 at 14:21

            geopandas 0.10.1

            • have noted that your data is on kaggle, so start by sourcing it
            • there really is only one issue shapely.geometry.MultiPoint() constructor does not work with a filtered series. Pass it a numpy array instead and it works.
            • full code below, have randomly selected a point to serve as gpdPoint



            getting scikit-learn version warning using yml environment
            Asked 2021-Aug-13 at 16:38

            I want to deploy a machine learning model and have the environment yml file and the model pickle file. When I include scikit-learn=0.23.2 to the dependencies, conda automatically uninstall this scikit-learn version and install scikit-learn-0.24.2 . Therefore, I get the following warning when I load the pickle file.

            UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.2 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.

            Here is the environment:



            Answered 2021-Aug-13 at 16:38

            Whatever is in the pip: section of a Conda environment YAML gets installed after the Conda environment is created, and is run with the pip install -U command. The -U gives Pip the permission to upgrade any existing packages if it is necessary to install the specified packages. In this particular case, the version of imblearn must be incompatible with the scikit-learn version you have selected.

            Remove imblearn

            Technically, you should be using imbalanced-learn not imblearn, as stated in the package description. That also means you don't even need to install from PyPI, since imbalanced-learn is available through Conda Forge.

            If you require having scikit-learn=0.23 then you must use imbalanced-learn=0.7. This should be under the regular dependencies, not in the pip: section.


            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install imbalanced-learn

            You can install using 'pip install imbalanced-learn' or download it from GitHub, PyPI.
            You can use imbalanced-learn like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.


            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
          • PyPI

            pip install imbalanced-learn

          • CLONE
          • HTTPS


          • CLI

            gh repo clone scikit-learn-contrib/imbalanced-learn

          • sshUrl


          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link