Streaming is the continuous transmission of data from a server to a client. These streaming data can be processed, stored, analyzed, and acted upon in real-time using stream processing technology. The common applications of streaming data process include IoT sensors, server logs etc. to find bugs/anomalies in real-time.
Anomaly Detection is the process of identification of suspicious/rare events by monitoring their significant difference from the standard behavioral pattern of data. In this solution, we identify anomaly in real-time streaming data by Machine Learning techniques.
The Streaming architecture and processing is handled by Kafka and Zookeeper.
A representative output of the execution of the Producer and consumer parts of files is provided below. The left pane is a command prompt which runs the producer file (which sends the data stream) and the right pane is a command prompt which runs the consumer file (which receives the data stream).
Deployment Information
The instructions for running a Streaming Anomaly detection application created using this kit are added in this section. The entire solution is available as a package to download from the source code repository.
For Windows OS,
- Download, extract and double-click the kit installer file to install the kit. Note: Do ensure to extract the zip file before running it. The installation may take from 10 to 20 minutes based on network bandwidth.
- When you're prompted during the installation of the kit, press Y to launch the app automatically.
- To run the app manually, press N when you're prompted and locate the folder 'streaming-anomaly-detection' in the "C://kandikits/streaming-anomaly-detection" location
- Navigate into the directory 'streaming-anomaly-detection'
- Download and setup Kafka by following the steps
- Once the kafka is setup and started, run the kafkaproducer.py and kafkaconsumer.py files in separate cmd prompts
For other Operating System,
- Download python
- Download the repository
- Extract the zip file and navigate to the directory 'streaming-anomaly-detection'
- Run the following commands to install Python
tar -xf python*.tar.gz cd python3.* ./configure sudo make install
- Open the terminal in the extracted directory 'streaming-anomaly-detection'
- Create and activate a virtual environment by these commands:
python3.9 -m venv example source example/bin/activate
- Install dependencies by executing the command
pip install -r requirements.txt
- Download and setup Kafka by following the steps.
- Once the kafka is setup and started, run the kafkaproducer.py and kafkaconsumer.py files in separate cmd prompts
Click on the button below to download the solution and follow the deployment instructions to begin set-up. This 1-click kit has all the required dependencies and resources you may need to build your Streaming Anomaly Detection App.
Libraries used in this solution
Streaming
Streaming libraries are essential for transmission of data in real-time.
kafka-pythonby dpkp
Python client for Apache Kafka
kafka-pythonby dpkp
Python 5211 Version:2.0.2 License: Permissive (Apache-2.0)
Machine Learning
Machine learning libraries and frameworks here are helpful in providing state-of-the-art solutions using Machine learning.
scikit-learnby scikit-learn
scikit-learn: machine learning in Python
scikit-learnby scikit-learn
Python 54584 Version:1.2.2 License: Permissive (BSD-3-Clause)