node2vec | repository provides a reference implementation | Natural Language Processing library
kandi X-RAY | node2vec Summary
kandi X-RAY | node2vec Summary
This repository provides a reference implementation of node2vec as described in the paper:. node2vec: Scalable Feature Learning for Networks. Aditya Grover and Jure Leskovec. Knowledge Discovery and Data Mining, 2016. The node2vec algorithm learns continuous representations for nodes in any (un)directed, (un)weighted graph. Please check the project page for more details.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of node2vec
node2vec Key Features
node2vec Examples and Code Snippets
Community Discussions
Trending Discussions on node2vec
QUESTION
I'm trying to use nodevector
's Node2Vec
class to get an embedding for my graph. I can't show the entire code, but basically this is what I'm doing:
ANSWER
Answered 2022-Feb-22 at 23:30TL;DR: Your non-default epochs=3
can result in some nodes appearing only 3 times – but the inner Word2Vec
model by default ignores tokens appearing fewer than 5 times. Upping to epochs=5
may be a quick fix - but read on for the reasons & tradeoffs with various defaults.
--
If you're using the nodevectors
package described here, it seems to be built on Gensim's Word2Vec
– which uses a default min_count=5
.
That means any tokens – in this case, nodes – which appear fewer than 5 times are ignored. Especially in the natural-language contexts where Word2Vec
was pioneered, discarding such rare words entirely usually has multiple benefits:
- from only a few idiosyncratic examples, such rare words themselves get peculiar vectors less-likely to generalize to downstream uses (other texts)
- compared to other frequent words, each gets very little training effort overall, & thus provides only a little pushback on shared model weights (based on their peculiar examples) - so the vectors are weaker & retain more arbitrary influence from random-initialization & relative positioning in the corpus. (More-frequent words provide more varied, numerous examples to extract their unique meaning.)
- because of the Zipfian distribution of word-frequencies in natural language, there are a lot of such low-frequency words – often even typos – and altogether they take up a lot of the model's memory & training-time. But they don't individually get very good vectors, or have generalizable beneficial influences on the shared model. So they wind up serving a lot like noise that weakens other vectors for more-frequent words, as well.
So typically in Word2Vec
, discarding rare words only gives up low-value vectors while simultaneously speeding training, shrinking memory requirements, & improving the quality of the remaining vectors: a big win.
Although the distribution of node-names in graph random-walks may be very different from natural-language word-frequencies, some of the same concerns still apply for nodes that appear rarely. On the other hand, if a node truly only appears at the end of a long chain of nodes, every walk to or from it will include the exact same neighbors - and maybe extra appearances in more walks would add no new variety-of-information (at least within the inner Word2Vec
window
of analysis).
You may be able to confirm if the default min_count
is your issue by using the Node2Vec
keep_walks
parameter to store the generated walks, then checking: are exactly the nodes that are 'missing' appearing fewer than min_count
times in the walks?
If so, a few options may be:
- override
min_count
using theNode2Vec
w2vparams
option to something likemin_count=1
. As noted above, this is always a bad idea in traditional natural-languageWord2Vec
- but maybe it's not so bad in a graph application, where for rare/outer-edge nodes one walk is enough, and then at least you have whatever strange/noisy vector results from that minimal training. - try to influence the walks to ensure all nodes appear enough times. I suppose some values of the
Node2Vec
walklen
,return_weight
, &neighbor_weight
could improve coverage - but I don't think they could guarantee all nodes appear in at least N (say, 5, to match the defaultmin_count
) different walks. But it looks like theNode2Vec
epochs
parameter controls how many time every node is used as a starting point – soepochs=5
would guarantee every node appears at least 5 times, as the start of 5 separate walks. (Notably: theNode2Vec
default isepochs=20
- which would never trigger a bad interaction with the internalWord2Vec
min_count=5
. But setting your non-defaultepochs=3
risks leaving some nodes with only 3 appearances.)
QUESTION
I'm running node2vec in neo4j but when the algorithm runs again with the same parameters, the result changes. So, I read the configuration and I see that there is a seedvalue. I tried to set the seedvalue in a specific number, but nothing changes..
...ANSWER
Answered 2022-Feb-07 at 20:43If you check the documentation you can see that the randomSeed
parameters is available. However, the results are still non-deterministic as per the docs:
Seed value used to generate the random walks, which are used as the training set of the neural network. Note, that the generated embeddings are still nondeterministic.
QUESTION
I have a network diagram that is sketched in Visio
. I would like to use it as an input for the networkx
graph from node2vec
python package. The documentation says that there is a function called to_networkx_graph()
that takes, as its input, the following types of data:
"any NetworkX graph dict-of-dicts dict-of-lists container (e.g. set, list, tuple) of edges iterator (e.g. itertools.chain) that produces edges generator of edges Pandas DataFrame (row per edge) numpy matrix numpy ndarray scipy sparse matrix pygraphviz agraph"
But, still, not mentioning other formats like Visio, pdf, odg, PowerPoint, etc.
So, how to proceed?
...ANSWER
Answered 2022-Jan-29 at 16:56I think you need to create some data in the format referred to in the documentation, not just a network diagram. A Visio diagram will not do the job and I know of no way to do a conversion.
QUESTION
I am trying to do node classification using Node2Vec and SVM on a graph obtained from protein-protein interaction to predict disease genes related to a specific disease. to be accurate. the point is that, I have created a Graph using networkx
, my nodes have labels
(names of protein) and attributes=0/1(if this protein is causing a disease or not). I applied node2vec
on this graph and I have my model. (I don't care about values of p and q at this stage) but I don't know how to proceed and feed it to SVM or more importantly, how to reduce dimensions of my vectorized graph before feeding it to SVM. plus, I don't know if in these vectors, the attributes of my node are included or not. separately however, I have a dictionary called lbls to have my nodes and their value
here is small piece of code
ANSWER
Answered 2021-Dec-16 at 08:38node2vec
is an unsupervised node embedding method, which is based on Word2Vec
. Have a look at the snap documentation for a brief description of the model. It will not use any of your attributes/features to create the embedding. It uses a shallow encoding, which are directly learned using a random walk based objective function. For details look at the paper or check the Stanford Lecture about node embeddings, which covers in detail Node2Vec. Your embedding dimension is currently 512
(for the node embedding of node2vec), which you probably can reduce.
QUESTION
I'm trying to do a link prediction with stellargraph, following the documention tutorial.
When I reach this part :
ANSWER
Answered 2021-Oct-06 at 13:55I finally found the solution. It was quite unclear (at least to me) from the documentation but your nodes' labels must be string and not integer.
So a simple .astype(str)
in my dataframe fixed it.
I hope this will help others in the future !
QUESTION
ANSWER
Answered 2021-Jun-30 at 16:59If I understand correctly, Node2Vec
is basd on word2Vec, and thus like word2vec, requires a large amount of varied training data, and shows useful results when learning dense high-dimensional vectors per entity.
A mere 7 'words' (country-nodes) with a mere 10 'sentences' of 2 words each (edge-pairs) thus isn't expecially likely to do anything useful. (It wouldn't in word2vec.)
These countries literally are regions on a sphere. A sphere's surface can be mapped to a 2-D plane - hence, 'maps'. If you just want a 2-D vector for each country, which reflects their relative border/distance relationships, why not just lay your 2-D coordinates over an actual map large enough to show all the countries, and treat each country as its 'geographical center' point?
Or more formally: translate the x-longitude/y-latitude of each country's geographical center into whatever origin-point/scale you need.
If this simple, physically-grounded approach is inadequate, then being explicit about why it's inadequate might suggest next steps. Something that's an incremental transformation of those starting points to meet whatever extra constraints you want may be the best solution.
For example, if your not-yet-stated formal goal is that "every country-pair with an actual border should be closer than any country-pair without a border", then you could write code to check that, list any deviations, and try to 'nudge' the deviations to be more compliant with that constraint. (It might not be satisfiable; I'm not sure. And if you added other constraints, like "any country pair with just 1 country between them should be closer than any country pair with 2 countries between them", satisfying them all at once could become harder.)
Ultimately, next steps may depend on exactly why you want these per-country vectors.
Another thing worth checking out might be the algorithms behind 'force-directed graphs'. There, after specifying a graph's desired edges/edge-lengths, and some other parameters, a physics-inspired simulation will arrive at some 2-d layout that tries to satisfy the inputs. See for example from the JS world:
QUESTION
In this tutorial, it has the following example: https://neo4j.com/developer/graph-data-science/applied-graph-embeddings/ where 'embeddingSize' is used for specify the vector length of the embedding.
...ANSWER
Answered 2021-May-12 at 13:31Graph embeddings were introduced in version 1.3 and the tutorial you found is for that version and it uses embeddingSize. Then 2nd link you found is the recent documentation for node2Vec and it is meant for >= 1.4 version. Look at the header of your 2nd link and you will see below
QUESTION
I am trying to run Node2Vec from the torch_geometric.nn library. For reference, I am following this example.
While running the train() function I keep getting TypeError: tuple indices must be integers or slices, not tuple
.
I am using torch version 1.6.0
with CUDA 10.1
and the latest versions of torch-scatter
,torch-sparse
,torch-cluster
, torch-spline-conv
and torch-geometric
.
Here is the detailed error:
Thanks for any help.
...ANSWER
Answered 2020-Nov-13 at 01:19The error was due to torch.ops.torch_cluster.random_walk
returning a tuple instead of an array/tensor. I fixed it by using replacing the functions pos_sample
and neg_sample
in the torch_geometric.nn.Node2Vec
with these.
QUESTION
I need help drawing a networkx directed graph. I have a directed graph which I create from a dataframe that looks as the following:
...ANSWER
Answered 2020-Sep-13 at 10:19You can use a seaborn palette to generate 12 different RGB color values and then create a column called color in your dataframe based on the weight values:
QUESTION
I'm testing feeding gensim's Word2Vec different sentences with the same overall vocabulary to see if some sentences carry "better" information than others. My method to train Word2Vec looks like this
...ANSWER
Answered 2020-Aug-23 at 22:05Each call to the Word2Vec()
constructor creates an all-new model.
However, runs are not completely deterministic under normal conditions, for a variety of reasons, so results quality for downstream evaluations (like your unshown clustering) will jitter from run-to-run.
If the variance in repeated runs with the same data is very large, there are probably other problems, such an oversized model prone to overfitting. (Stability from run-to-run can be one indicator that your process is sufficiently specified that the data and model choices are driving results, not the randomness used by the algorithm.)
If this explanation isn't satisfying, try adding more info to your question - such as the actual magnitude of your evaluation scores, in repeated runs, both with and without the changes that you conjecture are affecting results. (I suspect the variations from the steps you think are having effect will be no larger than variations from re-runs or different seed
values.)
(More generally, Word2Vec
is generally hungry for as much varies training data as possible; only if texts are non-representative of the relevant domain are they likely to result in a worse model. So I generally wouldn't expect being choosier about which subset of sentences is best to be an important technique, unless some of the sentences are total junk/noise, but of course there's always a change you'll find some effects in your particular data/goals.)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install node2vec
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page