node2vec | repository provides a reference implementation | Natural Language Processing library

by aditya-grover Scala Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | node2vec Summary

node2vec is a Scala library typically used in Artificial Intelligence, Natural Language Processing applications. node2vec has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

This repository provides a reference implementation of node2vec as described in the paper:. node2vec: Scalable Feature Learning for Networks. Aditya Grover and Jure Leskovec. Knowledge Discovery and Data Mining, 2016. The node2vec algorithm learns continuous representations for nodes in any (un)directed, (un)weighted graph. Please check the project page for more details.

Support

Quality

Security

License

Reuse

Support

node2vec has a medium active ecosystem.

It has 2490 star(s) with 906 fork(s). There are 63 watchers for this library.

It had no major release in the last 6 months.

There are 82 open issues and 19 have been closed. On average issues are closed in 68 days. There are 14 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of node2vec is current.

Quality

node2vec has 0 bugs and 0 code smells.

Security

node2vec has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

node2vec code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

node2vec is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

node2vec releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

It has 747 lines of code, 37 functions and 9 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of node2vec

Get all kandi verified functions for this library.

node2vec Key Features

No Key Features are available at this moment for node2vec.

node2vec Examples and Code Snippets

No Code Snippets are available at this moment for node2vec.

Community Discussions

Trending Discussions on node2vec

nodevectors not returning all nodes

How to set up a specific seedvalue in gds.beta.node2vec.write?

How to convert network diagram in Visio format to a a networkx graph?

stuck in feeding my SVM after creating my node2vec model

Stellargraph and Node2Vec embedding

Graph-based representation for land borders

Is the neo4j documentation inconsistent regarding embedding parameter?

Pytorch: Node2Vec: TypeError: tuple indices must be integers or slices, not tuple

Draw Networkx Directed Graph using clustering labels as color scheme

Does the gensim `Word2Vec()` constructor make a completely independent model?

QUESTION

nodevectors not returning all nodes

Asked 2022-Feb-22 at 23:30

I'm trying to use nodevector's Node2Vec class to get an embedding for my graph. I can't show the entire code, but basically this is what I'm doing:

...

ANSWER

Answered 2022-Feb-22 at 23:30

TL;DR: Your non-default epochs=3 can result in some nodes appearing only 3 times – but the inner Word2Vec model by default ignores tokens appearing fewer than 5 times. Upping to epochs=5 may be a quick fix - but read on for the reasons & tradeoffs with various defaults.

If you're using the nodevectors package described here, it seems to be built on Gensim's Word2Vec – which uses a default min_count=5.

That means any tokens – in this case, nodes – which appear fewer than 5 times are ignored. Especially in the natural-language contexts where Word2Vec was pioneered, discarding such rare words entirely usually has multiple benefits:

from only a few idiosyncratic examples, such rare words themselves get peculiar vectors less-likely to generalize to downstream uses (other texts)
compared to other frequent words, each gets very little training effort overall, & thus provides only a little pushback on shared model weights (based on their peculiar examples) - so the vectors are weaker & retain more arbitrary influence from random-initialization & relative positioning in the corpus. (More-frequent words provide more varied, numerous examples to extract their unique meaning.)
because of the Zipfian distribution of word-frequencies in natural language, there are a lot of such low-frequency words – often even typos – and altogether they take up a lot of the model's memory & training-time. But they don't individually get very good vectors, or have generalizable beneficial influences on the shared model. So they wind up serving a lot like noise that weakens other vectors for more-frequent words, as well.

So typically in Word2Vec, discarding rare words only gives up low-value vectors while simultaneously speeding training, shrinking memory requirements, & improving the quality of the remaining vectors: a big win.

Although the distribution of node-names in graph random-walks may be very different from natural-language word-frequencies, some of the same concerns still apply for nodes that appear rarely. On the other hand, if a node truly only appears at the end of a long chain of nodes, every walk to or from it will include the exact same neighbors - and maybe extra appearances in more walks would add no new variety-of-information (at least within the inner Word2Vec window of analysis).

You may be able to confirm if the default min_count is your issue by using the Node2Vec keep_walks parameter to store the generated walks, then checking: are exactly the nodes that are 'missing' appearing fewer than min_count times in the walks?

If so, a few options may be:

override min_count using the Node2Vec w2vparams option to something like min_count=1. As noted above, this is always a bad idea in traditional natural-language Word2Vec - but maybe it's not so bad in a graph application, where for rare/outer-edge nodes one walk is enough, and then at least you have whatever strange/noisy vector results from that minimal training.
try to influence the walks to ensure all nodes appear enough times. I suppose some values of the Node2Vec walklen, return_weight, & neighbor_weight could improve coverage - but I don't think they could guarantee all nodes appear in at least N (say, 5, to match the default min_count) different walks. But it looks like the Node2Vec epochs parameter controls how many time every node is used as a starting point – so epochs=5 would guarantee every node appears at least 5 times, as the start of 5 separate walks. (Notably: the Node2Vec default is epochs=20 - which would never trigger a bad interaction with the internal Word2Vec min_count=5. But setting your non-default epochs=3 risks leaving some nodes with only 3 appearances.)

Source https://stackoverflow.com/questions/71227156

QUESTION

How to set up a specific seedvalue in gds.beta.node2vec.write?

Asked 2022-Feb-07 at 20:43

I'm running node2vec in neo4j but when the algorithm runs again with the same parameters, the result changes. So, I read the configuration and I see that there is a seedvalue. I tried to set the seedvalue in a specific number, but nothing changes..

https://github.com/aditya-grover/node2vec/issues/83

...

ANSWER

Answered 2022-Feb-07 at 20:43

If you check the documentation you can see that the randomSeed parameters is available. However, the results are still non-deterministic as per the docs:

Seed value used to generate the random walks, which are used as the training set of the neural network. Note, that the generated embeddings are still nondeterministic.

Source https://stackoverflow.com/questions/71023512

QUESTION

How to convert network diagram in Visio format to a a networkx graph?

Asked 2022-Jan-29 at 16:56

I have a network diagram that is sketched in Visio. I would like to use it as an input for the networkx graph from node2vec python package. The documentation says that there is a function called to_networkx_graph() that takes, as its input, the following types of data:

"any NetworkX graph dict-of-dicts dict-of-lists container (e.g. set, list, tuple) of edges iterator (e.g. itertools.chain) that produces edges generator of edges Pandas DataFrame (row per edge) numpy matrix numpy ndarray scipy sparse matrix pygraphviz agraph"

But, still, not mentioning other formats like Visio, pdf, odg, PowerPoint, etc.

So, how to proceed?

...

ANSWER

Answered 2022-Jan-29 at 16:56

I think you need to create some data in the format referred to in the documentation, not just a network diagram. A Visio diagram will not do the job and I know of no way to do a conversion.

Source https://stackoverflow.com/questions/70900560

QUESTION

stuck in feeding my SVM after creating my node2vec model

Asked 2021-Dec-24 at 18:31

I am trying to do node classification using Node2Vec and SVM on a graph obtained from protein-protein interaction to predict disease genes related to a specific disease. to be accurate. the point is that, I have created a Graph using networkx , my nodes have labels(names of protein) and attributes=0/1(if this protein is causing a disease or not). I applied node2vec on this graph and I have my model. (I don't care about values of p and q at this stage) but I don't know how to proceed and feed it to SVM or more importantly, how to reduce dimensions of my vectorized graph before feeding it to SVM. plus, I don't know if in these vectors, the attributes of my node are included or not. separately however, I have a dictionary called lbls to have my nodes and their value here is small piece of code

...

ANSWER

Answered 2021-Dec-16 at 08:38

node2vec is an unsupervised node embedding method, which is based on Word2Vec. Have a look at the snap documentation for a brief description of the model. It will not use any of your attributes/features to create the embedding. It uses a shallow encoding, which are directly learned using a random walk based objective function. For details look at the paper or check the Stanford Lecture about node embeddings, which covers in detail Node2Vec. Your embedding dimension is currently 512 (for the node embedding of node2vec), which you probably can reduce.

Source https://stackoverflow.com/questions/70362866

QUESTION

Stellargraph and Node2Vec embedding

Asked 2021-Oct-06 at 13:55

I'm trying to do a link prediction with stellargraph, following the documention tutorial.
When I reach this part :

...

ANSWER

Answered 2021-Oct-06 at 13:55

I finally found the solution. It was quite unclear (at least to me) from the documentation but your nodes' labels must be string and not integer. So a simple .astype(str) in my dataframe fixed it. I hope this will help others in the future !

Source https://stackoverflow.com/questions/69434060

QUESTION

Graph-based representation for land borders

Asked 2021-Jun-30 at 16:59

I'm trying to get the 2D vectors from a set of countries. I've built my graph by the following process (see the picture):

each node represents a country
each edge represents the land border between 2 countries (or nodes)

I'm using Node2vec library to manage it but results are not relevant.

...

ANSWER

Answered 2021-Jun-30 at 16:59

If I understand correctly, Node2Vec is basd on word2Vec, and thus like word2vec, requires a large amount of varied training data, and shows useful results when learning dense high-dimensional vectors per entity.

A mere 7 'words' (country-nodes) with a mere 10 'sentences' of 2 words each (edge-pairs) thus isn't expecially likely to do anything useful. (It wouldn't in word2vec.)

These countries literally are regions on a sphere. A sphere's surface can be mapped to a 2-D plane - hence, 'maps'. If you just want a 2-D vector for each country, which reflects their relative border/distance relationships, why not just lay your 2-D coordinates over an actual map large enough to show all the countries, and treat each country as its 'geographical center' point?

Or more formally: translate the x-longitude/y-latitude of each country's geographical center into whatever origin-point/scale you need.

If this simple, physically-grounded approach is inadequate, then being explicit about why it's inadequate might suggest next steps. Something that's an incremental transformation of those starting points to meet whatever extra constraints you want may be the best solution.

For example, if your not-yet-stated formal goal is that "every country-pair with an actual border should be closer than any country-pair without a border", then you could write code to check that, list any deviations, and try to 'nudge' the deviations to be more compliant with that constraint. (It might not be satisfiable; I'm not sure. And if you added other constraints, like "any country pair with just 1 country between them should be closer than any country pair with 2 countries between them", satisfying them all at once could become harder.)

Ultimately, next steps may depend on exactly why you want these per-country vectors.

Another thing worth checking out might be the algorithms behind 'force-directed graphs'. There, after specifying a graph's desired edges/edge-lengths, and some other parameters, a physics-inspired simulation will arrive at some 2-d layout that tries to satisfy the inputs. See for example from the JS world:

https://github.com/d3/d3-force

Source https://stackoverflow.com/questions/68194848

QUESTION

Is the neo4j documentation inconsistent regarding embedding parameter?

Asked 2021-May-12 at 13:31

In this tutorial, it has the following example: https://neo4j.com/developer/graph-data-science/applied-graph-embeddings/ where 'embeddingSize' is used for specify the vector length of the embedding.

...

ANSWER

Answered 2021-May-12 at 13:31

Graph embeddings were introduced in version 1.3 and the tutorial you found is for that version and it uses embeddingSize. Then 2nd link you found is the recent documentation for node2Vec and it is meant for >= 1.4 version. Look at the header of your 2nd link and you will see below

Source https://stackoverflow.com/questions/67497553

QUESTION

Pytorch: Node2Vec: TypeError: tuple indices must be integers or slices, not tuple

Asked 2020-Nov-13 at 01:19

I am trying to run Node2Vec from the torch_geometric.nn library. For reference, I am following this example.

While running the train() function I keep getting TypeError: tuple indices must be integers or slices, not tuple.

I am using torch version 1.6.0 with CUDA 10.1 and the latest versions of torch-scatter,torch-sparse,torch-cluster, torch-spline-conv and torch-geometric.

Here is the detailed error:

Part 1 of the Error

Part 2 of the Error

Thanks for any help.

...

ANSWER

Answered 2020-Nov-13 at 01:19

The error was due to torch.ops.torch_cluster.random_walk returning a tuple instead of an array/tensor. I fixed it by using replacing the functions pos_sample and neg_sample in the torch_geometric.nn.Node2Vec with these.

Source https://stackoverflow.com/questions/64796931

QUESTION

Draw Networkx Directed Graph using clustering labels as color scheme

Asked 2020-Sep-13 at 10:19

I need help drawing a networkx directed graph. I have a directed graph which I create from a dataframe that looks as the following:

...

ANSWER

Answered 2020-Sep-13 at 10:19

You can use a seaborn palette to generate 12 different RGB color values and then create a column called color in your dataframe based on the weight values:

Source https://stackoverflow.com/questions/63369400

QUESTION

Does the gensim `Word2Vec()` constructor make a completely independent model?

Asked 2020-Aug-24 at 13:48

I'm testing feeding gensim's Word2Vec different sentences with the same overall vocabulary to see if some sentences carry "better" information than others. My method to train Word2Vec looks like this

...

ANSWER

Answered 2020-Aug-23 at 22:05

Each call to the Word2Vec() constructor creates an all-new model.

However, runs are not completely deterministic under normal conditions, for a variety of reasons, so results quality for downstream evaluations (like your unshown clustering) will jitter from run-to-run.

If the variance in repeated runs with the same data is very large, there are probably other problems, such an oversized model prone to overfitting. (Stability from run-to-run can be one indicator that your process is sufficiently specified that the data and model choices are driving results, not the randomness used by the algorithm.)

If this explanation isn't satisfying, try adding more info to your question - such as the actual magnitude of your evaluation scores, in repeated runs, both with and without the changes that you conjecture are affecting results. (I suspect the variations from the steps you think are having effect will be no larger than variations from re-runs or different seed values.)

(More generally, Word2Vec is generally hungry for as much varies training data as possible; only if texts are non-representative of the relevant domain are they likely to result in a worse model. So I generally wouldn't expect being choosier about which subset of sentences is best to be an important technique, unless some of the sentences are total junk/noise, but of course there's always a change you'll find some effects in your particular data/goals.)

Source https://stackoverflow.com/questions/63551484

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install node2vec

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: