mean-shift | Mean-shift clustering in Python in 100 lines | Machine Learning library

by zziz Python Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | mean-shift Summary

mean-shift is a Python library typically used in Telecommunications, Media, Advertising, Marketing, Artificial Intelligence, Machine Learning, Deep Learning, Numpy applications. mean-shift has no bugs, it has no vulnerabilities and it has low support. However mean-shift build file is not available. You can download it from GitHub.

mean-shift clustering algorithm in python using numpy only. running the code: python mean-shift.py. input: [[-0.85 -1.04 ] [ 1.18 -1.12 ] [ 1.237 1.242] [ 1.401 -1.81 ] [ 0.999 -0.518] [-1.013 -1.112] [-1.259 -0.561] [-0.878 0.884] [ 0.902 -1.478] [-0.911 0.548] [-0.888 0.892] [-0.204 -1.064] [-1.543 -0.721] [ 1.056 0.654] [ 1.374 1.273] [-1.571 -1.703] [ 1.169 -1.286] [ 1.271 0.583] [-0.147 -1.563] [ 1.1 0.999] [-1.139 -0.685] [-1.255 1.042] [ 0.672 -1.033] [ 1.361 -0.67 ] [-1.888 -1.148] [-0.76 1.065] [-0.159 -0.968] [ 1.009 -0.796] [-0.622 1.247] [ 0.536 -0.617] [ 1.399 -0.602] [-1.407 1.43 ] [-0.188 0.48 ] [-0.9 0.921] [-0.701 0.851] [-1.361 1.358] [-1.063 0.896] [-1.083 1.684] [-1.213 -0.529] [ 0.759 -0.613] [ 1.371 0.4 ] [ 0.864 1.234] [ 0.932 -1.609] [ 1.146 0.124] [-1.107 1.681] [ 1.497 0.674] [ 1.227 0.79 ] [ 0.761 -1.544] [-1.011 0.883] [ 0.964 1.078] [-0.634 1.341] [-1.513 -0.867] [-1.131 2.097] [-1.065 0.929] [-0.948 0.879] [-1.152 -1.273] [-1.179 0.645] [ 0.863 1.481] [-1.649 -1.443] [-0.891 1.023] [ 1.124 -1.047] [-0.927 -0.732] [ 1.49 -0.901] [ 1.019 -1.053] [-1.675 -0.241] [ 0.992 -0.607] [-1.015 0.361] [-0.016 1.137] [ 1.098 1.026] [ 0.782 0.456] [-0.732 -1.254] [-0.626 0.743] [-0.513 0.378] [-1.042

Support

Quality

Security

License

Reuse

Support

mean-shift has a low active ecosystem.

It has 11 star(s) with 5 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 0 have been closed. On average issues are closed in 522 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of mean-shift is current.

Quality

mean-shift has no bugs reported.

Security

mean-shift has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

mean-shift does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

mean-shift releases are not available. You will need to build from source code and install.

mean-shift has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed mean-shift and discovered the below as its top functions. This is intended to give you an instant insight into mean-shift implemented functionality, and help decide if they suit your requirements.

Performs clustering
Cluster points
Helper function to shift a point
Return the distance between two vectors
Return a list of n colors

Get all kandi verified functions for this library.

mean-shift Key Features

No Key Features are available at this moment for mean-shift.

mean-shift Examples and Code Snippets

No Code Snippets are available at this moment for mean-shift.

Community Discussions

Trending Discussions on mean-shift

Faster Kmeans Clustering on High-dimensional Data with GPU Support

Scikit-learn: What are the labels returned by MeanShift fit_predict()

OpenCV: Extracting the colour channels from an RGB image

np.reshape(): Converting an image into a feature array based on rgb intensities

Are the labels-output of cluster algorithms ordered in a certain order? (python, scikit learn)

How do I implement mean shift by using a grid of centroids?

Find peak (regions) in 2D data

OpenCV 3.2.0 windows, OpenCL shows devices in C++ but not in python

QUESTION

Faster Kmeans Clustering on High-dimensional Data with GPU Support

Asked 2019-Oct-15 at 07:37

We've been using Kmeans for clustering our logs. A typical dataset has 10 mill. samples with 100k+ features.

To find the optimal k - we run multiple Kmeans in parallel and pick the one with the best silhouette score. In 90% of the cases we end up with k between 2 and 100. Currently, we are using scikit-learn Kmeans. For such a dataset, clustering takes around 24h on ec2 instance with 32 cores and 244 RAM.

I've been currently researching for a faster solution.

What I have already tested:

Kmeans + Mean Shift Combination - a little better (for k=1024 --> ~13h) but still slow.
Kmcuda library - doesn't have support for sparse matrix representation. It would require ~3TB RAM to represent that dataset as a dense matrix in memory.
Tensorflow (tf.contrib.factorization.python.ops.KmeansClustering()) - only started investigation today, but either I am doing something wrong, or I do not know how to cook it. On my first test with 20k samples and 500 features, clustering on a single GPU is slower than on CPU in 1 thread.
Facebook FAISS - no support for sparse representation.

There is PySpark MlLib Kmeans next on my list. But would it make sense on 1 node?

Would it be training for my use-case faster on multiple GPUs? e.g., TensorFlow with 8 Tesla V-100?

Is there any magical library that I haven't heard of?

Or just simply scale vertically?

...

ANSWER

Answered 2019-Oct-11 at 22:06

Choose the algorithm wisely. There are clever algorithms, and there are stupid algorithms for kmeans. Lloyd's is stupid, but the only one you will find in GPUs so far. It wastes a lot of resources with unnecessary computations. Because GPU and "big data" people do not care about resource efficiency... Good algorithms include Elkan's, Hamerly's, Ying-Yang, Exponion, Annulus, etc. - these are much faster than Lloyd's.

Sklearn is one of the better tools here, because it at least includes Elkan's algorithm. But if I am not mistaken, it may be making a dense copy of your data repeatedly. Maybe in chunks so you don't notice it. When I compared k-means from sklearn with my own spherical k-means in Python, my implementation was many times faster. I can only explain this with me using sparse optimizations while the sklearn version performed dense operations. But maybe this has been improved since.
Implementation quality is important. There was an interesting paper about benchmarking k-means. Let me Google it:

Kriegel, H. P., Schubert, E., & Zimek, A. (2017). The (black) art of runtime evaluation: Are we comparing algorithms or implementations?. Knowledge and Information Systems, 52(2), 341-378.

They show how supposedly the same algorithm can have orders f magnitude runtime differences, depending on implementation differences. Spark does not fare very well there... It has too high overheads, too slow algorithms.
You don't need all the data.

K-means works with averages. The quality of the mean very slowly improves as you add more data. So there is little use in using all the data you have. Just use a large enough sample, and the results should be of almost the same quality. You can exploit this also for seeding. Run on a smaller set first, then add more data for refinement.
Because your data is sparse, there is a high chance that k-means is not the right tools anyway. Have you tested the quality of your results? How do you ensure attributes to be appropriately scaled? How much is the result determined simply by where the vectors are 0, and not by the actual non-zero values? Do results actually improve with rerunning k-means so often? What if you d not rerun k-means ever again? What if you just run it on a sample as discussed in 3)? What if you just pick k random centers and do 0 iterations of k-means? What is your best Silhouette? Chances are that you cannot measure the difference and are just wasting time and resources for nothing! So what do you do to ensure reliability of your results?

Source https://stackoverflow.com/questions/58346524

QUESTION

Scikit-learn: What are the labels returned by MeanShift fit_predict()

Asked 2019-Jul-12 at 08:44

I am trying to segment a colour image using the Mean-Shift algorithm using scikit-learn. There is something I would like to know about the MeanShift fit_predict() function. In the documentation for the MeanShift algorithm, it states that fit_predict() performs clustering on X and returns cluster labels.

What exactly are the cluster labels? Are they the labels for all the clusters the algorithm found, or is there a label for each data sample returned? Any insights are appreciated.

...

ANSWER

Answered 2019-Jul-12 at 08:44

There is a label returned for each training example. It is a combination of the fit() and the predict() function.

Source https://stackoverflow.com/questions/57002946

QUESTION

OpenCV: Extracting the colour channels from an RGB image

Asked 2019-Jul-12 at 01:16

I am trying to segment a colour image using Mean-Shift clustering using sklearn. I have read the image into a numpy array, however I want to extract each colour channel (R,G,B) so that I can use each as a variable for classification.

I have found the following code online, which extracts the RGB colour channels of an image which is represented as a numpy array.

...

ANSWER

Answered 2019-Jul-12 at 01:16

A normal picture you will have 3 layer, Red Green and Blue.

When you read an picture by a tool (example is open-cv), it will return for you a numpy array with shape (width_image x length_image x channels).

The arrange of channels depend on what you used, if you read by open-cv it will be Blue is first then Green then Red, else if you use matplotlib.pyplot.imread it will be Red-Green-Blue.

That code wrote like that because they read picture by open-cv

Source https://stackoverflow.com/questions/56981563

QUESTION

np.reshape(): Converting an image into a feature array based on rgb intensities

Asked 2019-Jul-11 at 02:06

I am trying to segment a colour image using the Mean-Shift algorithm using sklearn. I have the following code:

...

ANSWER

Answered 2019-Jul-11 at 02:06

It's because the image you're loading does not have RGB values (if you look at the dimensions, the last one is 4.

You need to first convert it to RGB like this:

Source https://stackoverflow.com/questions/56980636

QUESTION

Are the labels-output of cluster algorithms ordered in a certain order? (python, scikit learn)

Asked 2019-Jun-09 at 07:30

I'm using Shift-means clustering (https://scikit-learn.org/stable/modules/clustering.html#mean-shift) in which the labels of clusters are obtained from this source: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html

However,it's not clear how the labels of clusters (0,1,...) are generated. Appearly, it seems that label 0 is the cluster with more elements. It this a general rule?

How the others algorithms works? it's in a "random" sense? or the algorithms behind detecte the greater clusters for the 0 cluster?

Thanks!

PS: it's easy order the labels according this rule, my question is more theoretical.

...

ANSWER

Answered 2019-Jun-09 at 07:30

In many cases, the cluster order depends on the initialization. If you provide the initial values, then this order will be preserved.

If you do not provide such initial values, the order will usually be based on the data order. The first item is likely to belong to the first cluster, for example (withholding noise in some algorithms, such as DBSCAN).

Now quantity (cluster size) has an interesting effect: assuming that your data is randomly ordered (and not, for example, ordered by some synthetic data generation process) then the first element is more likely to belong to the "largest" cluster, so this cluster is most likely to come first even with "random" order.

Now in sklearn's mean-shift (which in my opinion contains an error in the final assignment rule) the authors decided to sort by "intensity" apparently, but I don't remember any such rule in the original papers. https://github.com/scikit-learn/scikit-learn/blob/7813f7efb/sklearn/cluster/mean_shift_.py#L222

Source https://stackoverflow.com/questions/56485763

QUESTION

How do I implement mean shift by using a grid of centroids?

Asked 2019-Mar-26 at 18:23

This is for a class and I would really appreciate your help! I made some changes based on a comment I received, but now I get another error.. I need to modify an existing function that implements the mean-shift algorithm, but instead of initializing all the points as the first set of centroids, the function creates a grid of centroids with the grid based on the radius. I also need to delete the centroids that don't contain any data points. My issue is that I don't understand how to fix the error I get!

...

ANSWER

Answered 2019-Mar-23 at 20:52

When you are running that loop: for i in centroids the i that is iterated through centroids isn't a number, it is a vector which is why an error is pops up. For example, the first i value might be equal to [0 1 2 0 1 2 0 1 2]. So to take an index of that doesn't make sense. What your code is saying to do is to take centroid = centroid[n1 n2 nk]. To fix it, you really need to change how your initialize centroid function works. Meshgrid also won't create an N dimensional grid, so your meshgrid might work for 2 dimensions but not N. I hope that helps.

Source https://stackoverflow.com/questions/55308957

QUESTION

Find peak (regions) in 2D data

Asked 2017-May-09 at 19:26

I am looking to find peak regions in 2D data (if you will, grayscale images or 2D landscapes, created through a Hough transform). By peak region I mean a locally maximal peak, yet NOT a single point but a part of the surrounding contributing region that goes with it. I know, this is a vague definition, but maybe the word mountain or the images below will give you an intuition of what I mean.

The peaks marked in red (1-4) are what I want, the ones in pink (5-6) examples for the "grey zone", where it would be okay if those smaller peaks are not found but also okay if they are.

Images contain between 1-20 peaked regions, different in height. The 2D data for above surf plot is shown below with a possible result (orange corresponds to Peak 1, green corresponds to Peak 2 a/b, ...). Single images for tests can be found in the description links:

Image left: input image - - - - middle: (okaish) result - - - - right: result overlayed over image.

The result above was produced using simple thresholding (MATLAB code):

...

ANSWER

Answered 2017-May-09 at 19:26

In such peak finding problems, I mostly use morphological operations. Since the Hough transform results are mostly noisy, I prefer blurring it first, then apply tophat and extended maxima transform. Then for each local maximum, find the region around it with adaptive thresholding. Here is a sample code:

Source https://stackoverflow.com/questions/43852754

QUESTION

OpenCV 3.2.0 windows, OpenCL shows devices in C++ but not in python

Asked 2017-Mar-10 at 18:27

I am using 2 sample scripts to check if python 3.6 can make use of the OpenCL functionalities of Opencv on windows. I have tried to run a couple of examples related to CAMSHIFT provided in samples and checked to see if I have OpenCL.

I would love to know why python shows that I do not have opencl while C++ with VS shows that I have opencl enabled devices

System Info:

Opencv 3.2.0 built from source using Opencl on and added python and numpy libraries

Python 3.6 for windows 64bit

Visual Studio 2015 community edition

Numpy using

...

ANSWER

Answered 2017-Mar-09 at 22:35

Try this:

Source https://stackoverflow.com/questions/42680645

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install mean-shift

You can download it from GitHub.
You can use mean-shift like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: