dirichlet-process | Nonparametric Bayes , Infinite Mixture Models | Data Visualization library

by echen R Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | dirichlet-process Summary

dirichlet-process is a R library typically used in Analytics, Data Visualization applications. dirichlet-process has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

Imagine you're a budding chef. A data-curious one, of course, so you start by taking a set of foods (pizza, salad, spaghetti, etc.) and ask 10 friends how much of each they ate in the past day. Your goal: to find natural groups of foodies, so that you can better cater to each cluster's tastes. For example, your fratboy friends might love wings and beer, your anime friends might love soba and sushi, your hipster friends probably dig tofu, and so on. So how can you use the data you've gathered to discover different kinds of groups?. One way is to use a standard clustering algorithm like k-means or Gaussian mixture modeling (see this previous post for a brief introduction). The problem is that these both assume a fixed number of clusters, which they need to be told to find. There are a couple methods for selecting the number of clusters to learn (e.g., the gap and prediction strength statistics), but the problem is a more fundamental one: most real-world data simply doesn't have a fixed number of clusters. That is, suppose we've asked 10 of our friends what they ate in the past day, and we want to find groups of eating preferences. There's really an infinite number of foodie types (carnivore, vegan, snacker, Italian, healthy, fast food, heavy eaters, light eaters, and so on), but with only 10 friends, we simply don't have enough data to detect them all. (Indeed, we're limited to 10 clusters!) So whereas k-means starts with the incorrect assumption that there's a fixed, finite number of clusters that our points come from, no matter if we feed it more data, what we'd really like is a method positing an infinite number of hidden clusters that naturally arise as we ask more friends about their food habits. (For example, with only 2 data points, we might not be able to tell the difference between vegans and vegetarians, but with 200 data points, we probably could.). Luckily for us, this is precisely the purview of nonparametric Bayes.*. *Nonparametric Bayes refers to a class of techniques that allow some parameters to change with the data. In our case, for example, instead of fixing the number of clusters to be discovered, we allow it to grow as more data comes in.

Support

Quality

Security

License

Reuse

Support

dirichlet-process has a low active ecosystem.

It has 294 star(s) with 78 fork(s). There are 20 watchers for this library.

It had no major release in the last 6 months.

There are 2 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of dirichlet-process is current.

Quality

dirichlet-process has no bugs reported.

Security

dirichlet-process has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

dirichlet-process does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

dirichlet-process releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of dirichlet-process

Get all kandi verified functions for this library.

dirichlet-process Key Features

No Key Features are available at this moment for dirichlet-process.

dirichlet-process Examples and Code Snippets

No Code Snippets are available at this moment for dirichlet-process.

Community Discussions

Trending Discussions on dirichlet-process

How to extract unsupervised clusters from a Dirichlet Process in PyMC3?

QUESTION

How to extract unsupervised clusters from a Dirichlet Process in PyMC3?

Asked 2017-Jan-31 at 21:53

I just finished the Bayesian Analysis in Python book by Osvaldo Martin (great book to understand bayesian concepts and some fancy numpy indexing).

I really want to extend my understanding to bayesian mixture models for unsupervised clustering of samples. All of my google searches have led me to Austin Rochford's tutorial which is really informative. I understand what is happening but I am unclear in how this can be adapted to clustering (especially using multiple attributes for the cluster assignments but that is a different topic).

I understand how to assign the priors for the Dirichlet distribution but I can't figure out how to get the clusters in PyMC3. It looks like the majority of the mus converge to the centroids (i.e. the means of the distributions I sampled from) but they are still separate components. I thought about making a cutoff for the weights (w in the model) but that doesn't seem to work the way I imagined since multiple components have slightly different mean parameters mus that are converging.

How can I extract the clusters (centroids) from this PyMC3 model? I gave it a maximum of 15 components that I want to converge to 3. The mus seem to be at the right location but the weights are messed up b/c they are being distributed between the other clusters so I can't use a weight threshold (unless I merge them but I don't think that's the way it is normally done).

...

ANSWER

Answered 2017-Jan-31 at 04:15

Using a couple of new-ish additions to pymc3 will help make this clear. I think I updated the Dirichlet Process example after they were added, but it seems to have been reverted to the old version during a documentation cleanup; I will fix that soon.

One of the difficulties is that the data you have generated is much more dispersed than the priors on the component means can accommodate; if you standardize your data, the samples should mix much more quickly.

The second is that pymc3 now supports mixture distributions where the indicator variable component has been marginalized out. These marginal mixture distributions will help accelerate mixing and allow you to use NUTS (initialized with ADVI).

Finally, with these truncated versions of infinite models, when encountering computational problems, it is often useful to increase the number of potential components. I have found that K = 30 works better for this model than K = 15.

The following code implements these changes and shows how the "active" component means can be extracted.

Source https://stackoverflow.com/questions/41553988

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install dirichlet-process

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: