How to perform one-class SVM using scikit-learn Python

by shivanisanju03 Updated: May 9, 2023

Solution Kit

One Class Support Vector Machines (SVM) is an outlier detection method. One-class SVMs are like Support Vector Machines, but only one class exists. Thus, we can determine the boundary using the available data. We should classify any new data outside this boundary as an outlier. We can derive One-class classification methods. We can do it by density estimation, boundary estimation, or reconstruction mode estimation. A single-class SVM is an unsupervised algorithm. It helps learn a decision function for novelty detection. It can use functions like classifying new data as like or different from the training set. The idea of novelty detection is to detect rare events.

There are two different approaches for One-Class SVM. One-class classification/unary classification, OCC for short. It involves fitting a model to "normal" data. We can predict whether the new data is normal or an outlier/anomaly. In this first approach, we can detect novelty by separating the data points. It is from the feature space and maximizes the distance from the hyperplane to the feature space. We can uncontrol the outlier detection using anomaly detection. We can control novelty detection using anomaly detection. Novelties/anomalies can form a dense cluster with a low density of training data.

Anomaly detection aims to identify outliers that do not belong to some target class. Algorithms you can implement are Isolation Forest, Local Outlier Factor, and Robust Covariance. The Local Outlier Factor (LOF) does not show decision boundaries in black. It has no prediction method to apply to new data when used to detect outliers.

The kernel coefficient for 'rbf' should choose an RBF kernel. However, there is no exact formula or algorithm for setting its bandwidth parameter. The most used kernel function is the Gaussian radial basis kernel function. We use a linear kernel classifier to get the straight line as a hyperplane.

A very good and widely used library for SVM classification is LibSVM, which we can use for Matlab. It supports one-class SVM according to the Schölkopf method. A method for SVDD based on the Tax and Duin algorithm is also available in the LibSVM tools.

In this kit, we will see how to perform one-class SVM using scikit-learn Python. The class One Class SVM implements a One-Class SVM used in outlier detection. We can use outlier and novelty detection for anomaly detection. It will happen where the interest is to detect abnormal or unusual observations. Outlier detection is also known as unsupervised anomaly detection. Novelty detection is semi-supervised.

In this context, novelties/anomalies may form a dense cluster. The outliers/anomalies cannot form a dense cluster in this context as it is the available estimates where we can locate the anomalies in low-density regions. It happens if they are in a low-density region of the training data, which is normal in this context.

Please check the below code on how to perform one-class SVM using scikit-learn Python.

Fig: Preview of the output that you will get on running this code from your IDE

Code

One class SVM model for text classification (scikit-learn)

PythonLines of Code : 15License : Strong Copyleft (CC BY-SA 4.0)

Dependent Libraries :

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

train = fetch_20newsgroups(subset='train', categories=['alt.atheism'], shuffle=True, random_state=42).data
test =  fetch_20newsgroups(subset='train', categories=['alt.atheism', 'soc.religion.christian'], shuffle=True, random_state=42).data

vectorizer = TfidfVectorizer()
train_vectors = vectorizer.fit_transform(train)
test_vectors = vectorizer.transform(test)

model = OneClassSVM(gamma='auto')
model.fit(train_vectors)

test_predictions = model.predict(test_vectors)

Instructions

Follow the steps carefully to get the output easily..

Install scikit-learn by using 'pip install -U scikit-learn'
Copy the snippet using the 'copy' and paste it in your IDE
Import one class SVM using : 'from sklearn.svm import OneClassSVM' [refer line 3 in the preview]
Add print statement at end of your code : 'print(test_predictions)' [refer preview].
Run the file to generate the output.

I hope you found this useful. I have added version information in the following sections.

I found this code snippet by searching for ' One class svm' in kandi. You can try any such use case!

Environment tested

I tested this solution in the following versions. Be mindful of changes when working with other versions.

The solution is created in PyCharm 2022.3.3 (Community Edition)
The solution is tested on Python 3.11.1.

Using this solution, we are able to understand how to use break and continue in python with simple steps. This process also facilities an easy way to use, hassle-free method to create a hands-on working version of code which would help us how to use break and continue in python

Dependent Library

scikit-learnby scikit-learn

Python

54584

Version:1.2.2

License: Permissive (BSD-3-Clause)

scikit-learn: machine learning in Python

Support

Quality

Security

License

Reuse

scikit-learnby scikit-learn

Python 54584 Version:1.2.2 License: Permissive (BSD-3-Clause)

scikit-learn: machine learning in Python

Support

Quality

Security

License

Reuse

You can search for any dependent libraries like 'scikit-learn' .

FAQ:

1. What is the One-Class Support Vector Machines (SVM) concept?

One-class support vector machines (OCSVM) is a one-class classification technique like SVDD. But instead of obtaining a bounding hypersphere around the training data. The OCSVM algorithm finds the maximum boundary hyperplane. It best separates the training data from the origin.

2. What is supervised anomaly detection?

The method requires a labeled data set containing normal and abnormal samples. It helps construct a predictive model for classifying future data points. Most algorithms are neural networks, support vector machine learning, and K-Nearest Neighbors classifier.

3. Are there any limitations on using the scikit-learn for training one-class SVM models?

The Sklearn implementation does not support online SVM training. It is possible to train an SVM, but it is challenging.

4. Is there a semi-supervised way of learning a Support Vector Machine model?

Yes, developing semi-supervised support vector machines is to support vector machines. It helps with training and classification. It uses a minimum labeled data amount and many unlabeled data.

5. How can we apply semi-supervised anomaly detection to SVM classification?

Semi-supervised anomaly detection approaches aim to use such labeled samples. But we can limit the proposed methods to only including labeled normal samples. A few methods use labeled anomalies while existing deep approaches are domain specific.

Support

For any support on kandi solution kits, please use the chat
For further learning resources, visit the Open Weaver Community learning page.

See similar Kits and Libraries

Open Weaver – Develop Applications Faster with Open Source

Terms
Privacy policy

How to perform one-class SVM using scikit-learn Python

Code

Instructions

Environment tested

Dependent Library

FAQ:

Support

Open Weaver – Develop Applications Faster with Open Source

kandi

Community and Support

Company

Follow