kandi background
Explore Kits

Getting started with Predictive Analysis

by Sri Balaji J

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction via a consistence interface in Python.
Features of scikit learn are : Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-learn. Unsupervised Learning algorithms − On the other hand, it also has all the popular unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to unsupervised neural networks. Clustering − This model is used for grouping unlabeled data. Dimensionality Reduction − It is used for reducing the number of attributes in data which can be further used for summarisation, visualisation and feature selection. Ensemble methods − As name suggest, it is used for combining the predictions of multiple supervised models. Feature extraction − It is used to extract the features from data to define the attributes in image and text data. Feature selection − It is used to identify useful attributes to create supervised models. Some libraries apart from scikit learn are :


In Classification, the output variable must be a discrete value. The task of the classification algorithm is to map the input value(x) with the discrete output variable(y).


In Regression, the output variable must be of continuous nature or real value. The task of the regression algorithm is to map the input value (x) with the continuous output variable(y).


A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group.

Dimensionality reduction

It is a way of converting the higher dimensions dataset into lesser dimensions dataset ensuring that it provides similar information.

Model selection

Model selection is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset.


Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model.

  • © 2022 Open Weaver Inc.