isomap | jeremy stober contact | Data Manipulation library

by stober Python Version: Current License: BSD-2-Clause

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | isomap Summary

isomap is a Python library typically used in Utilities, Data Manipulation, Numpy applications. isomap has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

Author: Jeremy Stober Contact: stober@gmail.com Version: 0.1. This is a Python implementation of Isomap built on top of my mds library (This supports both k and epsilon nearest neighbor graph computations prior to determining isometric distances between data points. You can find more info about isomap, as well as a Matlab implementation here:

Support

Quality

Security

License

Reuse

Support

isomap has a low active ecosystem.

It has 10 star(s) with 1 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 0 have been closed. On average issues are closed in 1345 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of isomap is current.

Quality

isomap has no bugs reported.

Security

isomap has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

isomap is licensed under the BSD-2-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

isomap releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed isomap and discovered the below as its top functions. This is intended to give you an instant insight into isomap implemented functionality, and help decide if they suit your requirements.

Compute the clustering of a sparse matrix .
r Simulates an Isomap .
Main entry point .
Test the test .
Find shortest paths in adjacency matrix .
Lists the components of the adjacency matrix .
Return a slice of a matrix .
Return a flattened index .
Return a square array of square points .
Return the norm of a vector .

Get all kandi verified functions for this library.

isomap Key Features

No Key Features are available at this moment for isomap.

isomap Examples and Code Snippets

No Code Snippets are available at this moment for isomap.

Community Discussions

Trending Discussions on isomap

How to compute/extract the residual variance from an Isomap [vegan] model in R

How do I use isomap from sklearn for dimensionalty reduction of high-dimensional space

Visualizing Manifold Learning MNIST digit data fails

How to link multiple graph networks in d3js so that an event in one calls the same event in other networks

Which algorithms can be used to generate a euclidean embedding for a manifold given a pairwise distance matrix of geodesics?

Split X into test/train before pre-processing and dimension reduction or after? Machine Learning

Different colors for scatter plots based on origin of data

R: Find shortest geodesic path between 2 points of a 2-D point cloud

QUESTION

How to compute/extract the residual variance from an Isomap [vegan] model in R

Asked 2020-Jun-16 at 14:59

I am currently trying to understand how Isomap results will differ from PCA and MDS and if they are more suited for my data. For this I started to work with the isomap function provided by vegan in R using the BCI dataset and their basic example https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/isomap (code below). Several publications compare the residual variance as a good measure (e.g. the "original paper by Tenenbaum from 2002, pg. 2321) https://web.mit.edu/cocosci/Papers/sci_reprint.pdf However, so far I have failed to extract this information from the object "ord" in the example. There is this element ord[["eig"]], probably connected to it, but so far I am confused. Help would be much appreciated!

...

ANSWER

Answered 2020-Jun-16 at 14:59

So I did some further investigation on this topic.

Essentially there will be as many Eigenvalues in the dataset as variables. The Eigenvals will be covered in the new components or dimensions according to their explanatory power, the first component or dimension will usually explain most i.e. have the largest Eigenvalue. Eigenvalues of 1 explain just one variable, which is pretty boring. Mathematically, Eigenvalues are the sum of squared factor loadings.

For Isomap from the example above ,this can be as follows:

Source https://stackoverflow.com/questions/62308710

QUESTION

How do I use isomap from sklearn for dimensionalty reduction of high-dimensional space

Asked 2019-Dec-12 at 02:46

I need example on how to use isomap from sklearn for dimensionalty reduction of high-dimensional space defined in numpy array.

...

ANSWER

Answered 2019-Dec-12 at 02:46

Load digits sample dataset from sklearn:

Source https://stackoverflow.com/questions/59296991

QUESTION

Visualizing Manifold Learning MNIST digit data fails

Asked 2019-Aug-09 at 13:26

I am doing some exercises with MNIST digits data but it fails when I try to visualize it. The exercise is from a book BTW. So I import the the dataset

...

ANSWER

Answered 2019-Aug-09 at 13:26

Target values for mnist data are strings and not integers.

Just change this line:

Source https://stackoverflow.com/questions/57429887

QUESTION

How to link multiple graph networks in d3js so that an event in one calls the same event in other networks

Asked 2018-Oct-18 at 18:07

I am drawing four network graphs of the same data but using different properties to color the node. Currently, I am able to generate all 4 networks and for each network I am able to double click a node to see the connected nodes of the network. My question is how do I extend this functionality so that if I click a node on any of the graph networks, the other 3 also show me the node I clicked and the connected nodes therein.

This question asked previously Mouseover event on two charts at the same time d3.js does a similar thing for pie charts but I am not sure how I can adapt it to my code.

...

ANSWER

Answered 2018-Oct-18 at 18:07

Glad you came back with the data and the HTML. It makes it easier to debug and come up with a solution.

I just changed a couple of lines of code in the connectedNodes function:

Instead of node.style("opacity", function(o) {...}), I replaced it with:

Source https://stackoverflow.com/questions/52861971

QUESTION

Which algorithms can be used to generate a euclidean embedding for a manifold given a pairwise distance matrix of geodesics?

Asked 2018-Sep-18 at 03:25

I have a square matrix D (currently represented as a numpy array of shape (572, 572)) plausibly corresponding to pairwise distances between points along the surface of a roughly cylindrical object. I.e., the value D[i,j] corresponds to the minimal length of any path along the surface of that hollow cylinder. How do I construct a 3-dimensional (or n-dimensional) embedding of those 572 points into euclidean space which preserves those geodesic distances?

Current Attempts

Algorithms like locally linear embedding and isomap are able to take that matrix of pairwise geodesic distances and output an embedding so that the pairwise euclidean distances are the same as the original geodesics. While this is not the same task in general, in the case where the output happens to approximate a hypercube in some dimension the desired transformation has actually happened (consider the swiss roll) since the embedding is itself a manifold, so euclidean distance corresponds to geodesic distance.

This is not the case for even slightly more complicated objects like cylinders. By treating geodesic distances as euclidean, antipodal points on the desired cylinder are mapped to locations much further from each other than desired, and the corresponding global optimization problem will often result in a branching structure with ends of the branches corresponding to maximally distant antipodal points, amplifying small perturbations in the random sampling of the cylinder. In general, naive applications of these algorithms doesn't seem to solve the problem at hand.

Another somewhat fruitful (though expensive) approach has been a brute monte carlo technique. I generate random samples from tubelike objects with varying parameters till I find a set of parameters generating geodesic distance matrices similar to mine, up to a permutation (which is dealt with not too inefficiently by solving the linear system converting that distance matrix to mine and testing to see if the result is near a permutation matrix). Then a near-optimal mapping from my 572 points onto that object preserving pairwise distances is performed by finding the nearest permutation matrix to the aforementioned near-permutation matrix.

This is yielding plausible results, but it presupposes the shape of the data and is horrendously expensive. I've performed some of the more obvious optimizations like working with small random samples instead of the entire data set and using gradient-based techniques for parameter estimation, but a more general-purpose technique would be nice.

Caveats

This problem of course does not have a unique solution. Even assuming that manifolds can be unambiguously identified in 3-space from a finite uniform sampling, just squishing a cylinder yields a shape with the same geodesics and different euclidean distances (hence a different embedding). This does not bother me any more than LLE and Isomap yielding differing solutions, and I would be fine with any plausible answer.

With regards to uniquely identifying manifolds from a finite sample, for the sake of argument I would be fine just using the dist_matrix_ attribute from a fitted Isomap class from the scikit-learn package without any special parameters to find the geodesics. That performs an unnecessary MDS step, but it isn't terribly expensive, and it works out of the box. We would then like an embedding which minimizes the frobenius distance between the original geodesic distance matrix and the dist_matrix_ attribute.

...

ANSWER

Answered 2018-Jun-07 at 17:47

While I had initially ruled out locally linear embedding and other similar techniques, that seems to have been in haste. Since manifolds are in fact locally linear, a sufficiently well-sampled, sufficiently nice manifold has the property that its small geodesic distances are approximately the same as their corresponding euclidean distances.

With that in mind, any reconstruction which treats the nearest geodesic neighbors as the nearest euclidean neighbors and approximates the euclidean distance via the geodesic distance will approximately preserve global geodesic distance, up to an accumulated error term. This means that all the standard algorithms which only use local distances have the ability to provide an approximately correct embedding. These include and are not limited to

Locally Linear Embedding
Isomap
Spectral Embedding

Some classical embedding algorithms will not work correctly in this application since they attempt to preserve all distances, and the large geodesics are probably poor representations of euclidean distance. For example, multidimensional scaling is a poor fit without modifications.

Note The reason LLE seemed to yield poor results in my preliminary analysis is that one of my assumptions was violated -- the manifold being sufficiently well-sampled. I was applying it to simple shapes with known desired behavior, but I mistakenly used too few points to ensure a quick feedback loop in my analysis. Better-sampled manifolds behave exactly as they're supposed to.

Source https://stackoverflow.com/questions/50705667

QUESTION

Split X into test/train before pre-processing and dimension reduction or after? Machine Learning

Asked 2017-Dec-14 at 06:50

I have been completing Microsoft's course DAT210X - Programming with Python for Data Science.

When creating SVC models for Machine Learning we are encouraged to split out the dataset X into test and train sets, using train_test_split from sci-kit learn, before performing preprocessing e.g. scaling and dimension reduction e.g. PCA/Isomap. I include a code example, below, of part of a solution i wrote to a given problem using this way of doing things.

However, it appears to be much faster to preprocess and PCA/IsoMap on X before splitting X out into test and train and there was a higher accuracy score.

My questions are:

1) Is there a reason why we can't slice out the label (y) and perform pre-processing and dimension reduction on all of X before splitting out to test and train?

2) There was a higher score with pre-processing and dimension reduction on all of X (minus y) than for splitting X and then performing pre-processing and dimension reduction. Why might this be?

...

ANSWER

Answered 2017-Aug-11 at 16:52

Regarding

1) Is there a reason why we can't slice out the label (y) and perform pre-processing and dimension reduction on all of X before splitting out to test and train?

The reason is that you should train your model on the training data, without using any information regarding the test data. If you apply PCA on the whole data (including the test data) before training the model, then you in fact use some information from the test data. Thus, you cannot really judge the behaviour of your model using the test data, because it is not an unseen data anymore.

Regarding:

2) There was a higher score with pre-processing and dimension reduction on all of X (minus y) than for splitting X and then performing pre-processing and dimension reduction. Why might this be?

This makes complete sense. You used some information from the test data to train the model, so it makes sense that the score on the test data would be higher. However, this score does not really give an estimate of the model's behaviour on unseen data anymore.

Source https://stackoverflow.com/questions/45639915

QUESTION

Different colors for scatter plots based on origin of data

Asked 2017-Aug-21 at 18:34

I have a LIST called 'samples', I am loading several images into this LIST from 2 different folders, let's say Folder1 and Folder2. Then I convert this list to a DataFrame and plot them in a 2D scatter plot. I want the scatter plot to show all contents from Folder1 to be Red color and all contents from Folder2 to be in blue color. How can I accomplish this. My code is below:

...

ANSWER

Answered 2017-Aug-20 at 18:47

It's pretty difficult to say without seeing the arrays you are working with. You are actually plotting the result of your do_ISO() function, which creates an array using sklearn.manifold.Isomap.transform().

Does this function preserves the ordering of your elements in you array? If so, things could be fairly easy. As you are first filling all the images from Folder1 and then from Folder2, you could simply count the number of items in Folder1, and split your array in 2 based on that number (eg. nbFilesFolder1). then you do 2 calls to scatter:

Source https://stackoverflow.com/questions/45779760

QUESTION

R: Find shortest geodesic path between 2 points of a 2-D point cloud

Asked 2017-Jan-12 at 22:58

I created the following graph with help of two functions written by Vincent Zoonekynd (you can find them here) (find my code at the end of the post).

In order to be able to explain what that neighbourhood graph and that parameter "k" is, which the Isometric Feature Mapping uses. "k" specifies how many points each point is directly connected to. Their distance is just the euclidian distance to each other. The distance between any point and its (k + 1)-nearest point (or any point farther away) is called "geodesic", and is the smallest sum of all the lengths of the edges needed to get there. This is sometimes much longer than the euclidian distance. This is the case for points A and B in my figure.

Now I want to add a black line showing the geodesic distance from point A to point B. I know about the command segments(), which will probably be the best for adding the line, and I know that one algorithm to find the shortest path (geodesic distance) is Dijkstra's Algorithm and that it is implemented in the package igraph. However, I'm neither able to have igraph interpret my graph nor to find out the points (vertices) that need to be passed (and their coordinates) on my own.

By the way, if k = 18, i.e. if every point is directly connected to the 18 nearest points, the geodesic distance between A and B will just be the euclidian distance.

...

ANSWER

Answered 2017-Jan-12 at 22:58

The following code may help you, it use your data to create an igraph object with weight that are in your case, the euclidean distances between nodes. Then you find the weighted shortest path which is returned by sp$vpath[[1]]. In the following example, it is the shortest path between nodes number 5 and 66. I edited the code with the solution to plot from mattu

Source https://stackoverflow.com/questions/41598666

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install isomap

You can download it from GitHub.
You can use isomap like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: