k-means | A Python implementation of k-means clustering algorithm | Machine Learning library

by kjahan Python Version: Current License: No License

X-Ray Key Features Code Snippets(4)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | k-means Summary

k-means is a Python library typically used in Artificial Intelligence, Machine Learning, Hadoop applications. k-means has no bugs, it has no vulnerabilities and it has high support. However k-means build file is not available. You can download it from GitHub.

This project is a Python implementation of k-means clustering algorithm.

Support

Quality

Security

License

Reuse

Support

k-means has a highly active ecosystem.

It has 99 star(s) with 76 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

There are 5 open issues and 1 have been closed. There are 2 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of k-means is current.

Quality

k-means has 0 bugs and 0 code smells.

Security

k-means has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

k-means code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

k-means does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

k-means releases are not available. You will need to build from source code and install.

k-means has no build file. You will be need to create the build yourself to build the component from source.

k-means saves you 92 person hours of effort in developing the same functionality from scratch.

It has 235 lines of code, 19 functions and 5 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of k-means

Get all kandi verified functions for this library.

k-means Key Features

No Key Features are available at this moment for k-means.

k-means Examples and Code Snippets

title: K-means clustering tags: algorithm,array expertise: advanced author: chalarangelo cover: blog_images/antelope.jpg firstSeen: 2020-12-28T15:38:40+02:00 lastUpdated: 2020-12-29T16:32:46+02:00

npm

Lines of Code : 40

License : No License

Copy

const kMeans = (data, k = 1) => {
  const centroids = data.slice(0, k);
  const distances = Array.from({ length: data.length }, () =>
    Array.from({ length: k }, () => 0)
  );
  const classes = Array.from({ length: data.length }, () =>

Plot k means cost matrix

python

Lines of Code : 60

License : No License

Copy

def plot_k_means(X, K, index_word_map, max_iter=20, beta=1.0, show_plots=True):
    N, D = X.shape
    M = np.zeros((K, D))
    R = np.zeros((N, K))
    exponents = np.empty((N, K))

    # initialize M to random
    for k in range(K):
        M[k] =

plots k means for k means

python

Lines of Code : 56

License : No License

Copy

def plot_k_means(X, K, max_iter=20, beta=3.0, show_plots=False):
    N, D = X.shape
    # R = np.zeros((N, K))
    exponents = np.empty((N, K))

    # initialize M to random
    initial_centers = np.random.choice(N, K, replace=False)
    M = X[initia

Generate a plot of k - means .

python

Lines of Code : 36

License : No License

Copy

def plot_k_means(X, K, max_iter=20, beta=1.0):
    N, D = X.shape
    M = np.zeros((K, D))
    R = np.ones((N, K)) / K

    # initialize M to random
    for k in range(K):
        M[k] = X[np.random.choice(N)]

    grid_width = 5
    grid_height = ma

Community Discussions

Trending Discussions on k-means

Clustering of 1 dimensional data

YellowBrick ImportError: cannot import name 'safe_indexing' from 'sklearn.utils'

KMeans clustering from all possible combinations of 2 columns not producing correct output

k-mean clustering - inertia only gets larger

k-means clustering with survival Data

Draw or resize plotted quantized image with nearest neighbour scaling

CUML fit functions throwing cp.full TypeError

K mean clustering based on attribute constrain

Looping over the threshold area does not change pixel values

Is there a particular way to convert 3-d array to 2-d array for clustering?

QUESTION

Clustering of 1 dimensional data

Asked 2021-Jun-08 at 09:03

I am trying to learn the k-means clustering algorithm in MATLAB without using inbuilt k-means function. Say I have the data of size 1x100 and I want to group them into two clusters. So how can I do this. I want to visualize the two centroids and data together on a plot in MATLAB. Note : When I plot in MATLAB, I am able to see only data but not the data and two centroids simultaneously.

Any help in this regard is highly appreciated.

...

ANSWER

Answered 2021-Jun-08 at 09:03

A minimal K-means clustering algorithm in matlab could be:

Source https://stackoverflow.com/questions/67870539

QUESTION

YellowBrick ImportError: cannot import name 'safe_indexing' from 'sklearn.utils'

Asked 2021-Jun-04 at 02:13

I'm trying to plot a silhouette plot for a K-Means model I've run, however, I get the error: ImportError: cannot import name 'safe_indexing' from 'sklearn.utils.

I was initially getting the NotFoundError issue described in this post here however I got this issue once I fixed it. My code is below:

...

ANSWER

Answered 2021-Jan-09 at 03:58

In python3.7 with yellowbrick v1.2 and scikit-learn v0.23.2 I don't get the error but when I bump scikit-learn to v0.24.0 I get the same ImportError.

I checked the API documentation and, like in the link you referenced, in sklearn v0.22.0 they deprecated utils.safe_indexing to private (utils._safe_indexing). But I don't get the import error with yellowbrick v1.2 and v0.22.0<=sklearn<=v0.23.2 installed. I think they just finally removed the public utils.safe_indexing in v0.24.0 so if you just install a version of scikit-learn before v0.24.0 the import should work.

Edit: Here is yellowbrick's github issue if you want to track their progress on a workaround or update for this problem

Source https://stackoverflow.com/questions/65602076

QUESTION

KMeans clustering from all possible combinations of 2 columns not producing correct output

Asked 2021-May-31 at 23:51

I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns.

However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examples of the output. Only cluster_1 seems OK but the other 3 look completely wrong .

How best can I fix my clustering? This is the sample code I am using

...

ANSWER

Answered 2021-May-31 at 23:51

You compute the clusters in four dimensions. Note this implies the centroids are four-dimensional points too. Then you plot two-dimensional projections of the clusters. So when you plot the centroids, you have to pick out the same two dimensions that you just used for the scatterplot of the individual points.

Source https://stackoverflow.com/questions/67780180

QUESTION

k-mean clustering - inertia only gets larger

Asked 2021-May-20 at 16:46

I am trying to use the KMeans clustering from faiss on a human pose dataset of body joints. I have 16 body parts so a dimension of 32. The joints are scaled in a range between 0 and 1. My dataset consists of ~ 900.000 instances. As mentioned by faiss (faiss_FAQ):

As a rule of thumb there is no consistent improvement of the k-means quantizer beyond 20 iterations and 1000 * k training points

Applying this to my problem I randomly select 50000 instances for training. As I want to check for a number of clusters k between 1 and 30.

Now to my "problem":

The inertia is increasing directly as the number of cluster increases (n_cluster on the x-axis):

I tried varying the number of iterations, the number of redos, verbose and spherical, but the results stay the same or get worse. I do not think that it is a problem of my implementation; I tested it on a small example with 2D data and very clear clusters and it worked.

Is it that the data is just bad clustered or is there another problem/mistake I have missed? Maybe the scaling of the values between 0 and 1? Should I try another approach?

...

ANSWER

Answered 2021-May-20 at 16:46

I found my mistake. I had to increase the parameter max_points_per_centroid. As I have so many data points it sampled a sub-batch for the fit. For a larger number of clusters this sub-batch is larger. See FAQ of faiss:

max_points_per_centroid * k: there are too many points, making k-means unnecessarily slow. Then the training set is sampled

The larger subbatch of course has a larger inertia as there are more points in total.

Source https://stackoverflow.com/questions/67550997

QUESTION

k-means clustering with survival Data

Asked 2021-May-19 at 12:31

just a basic question concerning k-means clustering analysis on survival data, like this one:

I am doing k-means clustering to identify clusters which Gene influences the survival most... However do I include the survival time into my k-means function or should I leave it out? So should I put it into the kmeans() function e.g. in R?

Kind regards,

Hashriama

...

ANSWER

Answered 2021-May-19 at 12:31

I think that your approach is not the best one. Your goal is to select genes associated with censored/uncensored survival. The use of supervised methods seems the most suitable. Using a k-means will only cluster genes by similarities without regard to survival, and even if you wanted to add survival in your modeling it would not make sense because you are omitting censoring.

There are Cox regressions to which an L1 penalty is added, allowing variable selection without omitting censoring. This kind of approach seems more appropriate to accomplish your goal and fits better in your context. To learn more, here is an article from Jiang Gui & Hongzhe Li that uses penalized Cox regression (look at the R package biospear too if needed): https://academic.oup.com/bioinformatics/article/21/13/3001/196819

Source https://stackoverflow.com/questions/67589922

QUESTION

Draw or resize plotted quantized image with nearest neighbour scaling

Asked 2021-May-18 at 18:38

Following this example of K means clustering I want to recreate the same - only I'm very keen for the final image to contain just the quantized colours (+ white background). As it is, the colour bars get smooshed together to create a pixel line of blended colours.

Whilst they look very similar, the image (top half) is what I've got from CV2 it contains 38 colours total. The lower image only has 10 colours and is what I'm after.

Let's look at a bit of that with 6 times magnification:

I've tried :

...

ANSWER

Answered 2021-May-18 at 16:27

I recommend you to show the image using cv2.imshow, instead of using matplotlib.

cv2.imshow shows the image "pixel to pixel" by default, while matplotlib.pyplot matches the image dimensions to the size of the axes.

Source https://stackoverflow.com/questions/67589929

QUESTION

CUML fit functions throwing cp.full TypeError

Asked 2021-May-06 at 17:13

I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

TLDR;

Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

...

ANSWER

Answered 2021-May-06 at 17:13

Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

!pip install cupy-cuda110==8.6.0:

I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

EDIT: script updated.

Source https://stackoverflow.com/questions/67368715

QUESTION

K mean clustering based on attribute constrain

Asked 2021-May-06 at 16:15

Attribute-Based K-Means Algorithm

I have spatial point datasets with values, I want to identify unique set of nearest points (non-overlapping) where cumulative sum value will be nearest to X (defined amount). here is the example

...

ANSWER

Answered 2021-May-06 at 16:15

Here is an approach to get attributed weighted cluster for predefined K

Source https://stackoverflow.com/questions/67320686

QUESTION

Looping over the threshold area does not change pixel values

Asked 2021-Apr-30 at 12:13

I am currently working on a design of steganography system that detects a certain area in an image using several techniques (K-means, Canny edge detection) using Python and OpenCV library. I am facing a huge problem updating the image pixel values to contain my secret data in the least significant bit.

I started with finding the threshold after several calculations.

thresh = cv.adaptiveThreshold(imgray,100,cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY_INV,11,2)

And to test the area I did the following:

...

ANSWER

Answered 2021-Apr-30 at 12:13

Your issue is that advanced indexing returns copies of arrays, so by doing image[thresh==0][i][j] all the time you modify freshly created copies which are then discarded.

Let's start with some fake data

Source https://stackoverflow.com/questions/67326366

QUESTION

Is there a particular way to convert 3-d array to 2-d array for clustering?

Asked 2021-Apr-22 at 15:19

I have a 3-d array of shape=(3, 60000, 10) which needs to be 2-D so as to be able to visualize it when clustering.

I was planning on implementing the k-means clustering from scikit-learn to the 3-d array and read that it only takes in 2-D shape , I just wanted some advice as to whether there is a right way to do it ? I was planning on making it (60000,30) , but wanted a clarification before I go ahead.

...

ANSWER

Answered 2021-Apr-22 at 15:19

(600000 , 30) is probably not a great idea. K-means clustering uses a distance metrics to define clusters, Euclidean distance normally, but when you increase number of variables in the second dimension you fall into a curse of dimensionality where results of clustering will stop making sense.

You can of course try (600000, 30) and see if it works, but if it doesn't, you'll need to do reduce dimensionality, for example by doing a PCA and use principal components to do clustering.

EDIT I'll probably try and explain what I mean by dimensionality and the issues it causes since there appears to be some confusion.

A 2d array of size (100, 2) is a 2-dimensional data, i.e. it's 100 observations of 2 variables. The trend line between those points would be a 1d object (line) and you can plot it on a 2d plane. Similarly, a (100, 3) array is 3-dimensional with a trendline being a 2d plane and you can plot those points on a 3d chart.

Then (100, 100) array is 100-dimensional. A trend would be a 99-dimensional hyperplane and you cannot visualise even in principle. Now let's see what issues this causes. Let's define a simple function calculating Euclidean distance:

Source https://stackoverflow.com/questions/67208516

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install k-means

You can download it from GitHub.
You can use k-means like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: