Community-Detection | solve community detection problem

by yl495 Python Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(3)Vulnerabilities Install Support

kandi X-RAY | Community-Detection Summary

Community-Detection is a Python library typically used in Telecommunications, Media, Advertising, Marketing applications. Community-Detection has no bugs, it has no vulnerabilities and it has low support. However Community-Detection build file is not available. You can download it from GitHub.

Aim to solve community detection problem via a semi-supervised and data driven algorithm: GCN. Take the network graph as input and predict which group each node belongs to.

Support

Quality

Security

License

Reuse

Support

Community-Detection has a low active ecosystem.

It has 5 star(s) with 2 fork(s). There are no watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of Community-Detection is current.

Quality

Community-Detection has no bugs reported.

Security

Community-Detection has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

Community-Detection does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

Community-Detection releases are not available. You will need to build from source code and install.

Community-Detection has no build file. You will be need to create the build yourself to build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed Community-Detection and discovered the below as its top functions. This is intended to give you an instant insight into Community-Detection implemented functionality, and help decide if they suit your requirements.

Computes the GCN layer .
Load community data from a labelfile .
Preprocessing for preprocessing .
Train training data
Shuffle the data in the graph .

Get all kandi verified functions for this library.

Community-Detection Key Features

No Key Features are available at this moment for Community-Detection.

Community-Detection Examples and Code Snippets

No Code Snippets are available at this moment for Community-Detection.

Community Discussions

Trending Discussions on Community-Detection

Community detection for large, directed graphs

R: identifying points in a graph (possible with dplyr?)

meaning of weights in community detection algorithms

QUESTION

Community detection for large, directed graphs

Asked 2021-Feb-19 at 15:27

In Clustering and Community Detection in Directed Networks:A Survey Malliaros & Vazirgiannis (2013) describe many algorithms for clustering and community detection in directed graphs. I have a relatively large graph, 400.000 nodes, 180.000.000 edges and are looking for software that could detect communities in it, but the program for network analysis I've looked into, the igraph package for R, does not seem to have any algorithms that is capable of detecting clusters in big directed networks (igraph has cluster_fast_greedy(), and cluster_louvain() but they only work on undirected graphs). Is there any package, either in R or in python, that can do this?

A similar question was posed in Community detection on a very large graph, the difference being that I need packages for python or R.

...

ANSWER

Answered 2021-Feb-19 at 15:27

You can use the Leiden algorithm from the Python package leidenalg, which should be even faster than the Louvain algorithm that you mention. This package works on a multitude of different networks, including directed networks, but also multiplex networks and networks with negative links. In addition, it supports a range of different quality functions. It should easily scale to networks with millions of node (provided it fits in memory of course), with runtimes usually of a couple of minutes at most.

Disclaimer: I am the author of the package (and a number of related publications).

Source https://stackoverflow.com/questions/66279851

QUESTION

R: identifying points in a graph (possible with dplyr?)

Asked 2020-Nov-26 at 00:42

I found a previous stackoverflow post that deals with a similar question that I have, but the answer there is not quite the same : Check which community a node belongs in louvain community detection

I created some data in R and then made a graph. After making the graph, I performed clustering on the graph. Now, suppose I have a list of people, I want to find out which cluster they belong to.

I understand that it is easy to manually inspect the data and find this out, however I think this would be very difficult to do if you had a big data set.

I have written the code below. Everything works until the last 2 lines where I try to find out which clusters do "John", "Peter" and "Tim" belong to:

...

ANSWER

Answered 2020-Nov-26 at 00:16

The membership of the vertices are held in $membership and the names of the vertices are in $names:

Source https://stackoverflow.com/questions/65014400

QUESTION

meaning of weights in community detection algorithms

Asked 2018-Mar-16 at 11:28

There is an excellent comparison of community detection algorithms available in igraph here. However, there's some ambiguity about the use of weights in the algorithms that can be applied with weighted edges.

Typically, edge weights will be oriented so that higher weights suggest keeping the nodes together (eg strength of friendship). This works nicely with modularity scores by comparing average weighted density within and externally.

However, the Newman-Girvan community detection algorithm uses betweenness, which is based on distances. In this case, I would expect that the edge weights should reflect the distance between nodes so that calculating shortest paths sums the weights over the path. That is, the weight is a cost or distance score, where higher values should break into different communities.

Am I correct in expecting higher weights for greater distances when using Newman-Girvan and, if so, how then does this reconcile with using modularity to decide where to cut the number of communities?

...

ANSWER

Answered 2018-Mar-16 at 11:28

My answer is going to be based on the igraph package in R. The situation is indeed quite confusing and the questions are relevant since, as Newman (2004) says,

Since the publication of that work, the author has been asked a number of times whether an appropriate generalization of the algorithm exists for weighted networks.

In his paper he derives an appropriate generalization of the Newman-Girvan algorithm to weighted networks.

Weights

You are right about the interpretation of weights in the Newman-Girvan algorithm. edge_betweenness uses a formula analogous to that in (Brandes, 2001), where the length of a path is defined as the sum of the weights of its edges. (You may also check the source code but it's quite involved). In ?edge_betweenness and, in particular, ?cluster_edge_betweenness it says

Edge weights are used to calculate weighted edge betweenness. This means that edges are interpreted as distances, not as connection strengths.

The implications are as follows. Let b(e, w) be the edge betweenness of an edge e with weight w. Then it can be shown (I could elaborate if you wish) that

b(e, w) <= b(e, w*) if and only if w >= w*.

That is, the edge betweenness and the weight of e are inversely related. The main idea is that given, e.g., w* >> w, those shortest paths that were crossing e now are likely to be dominated by some other paths that do not include e. Hence, larger weight implies (weakly) lower betweenness, and lower betweenness makes it less likely that e will be recognized as an edge connecting two communities. Thus, this sounds strange if we see weights as distances. On the other hand, if e is within some community and we decrease its weight, then the number of shortest paths through that edge potentially increases and it becomes more likely to be seen as connecting two communities. I am not yet claiming anything about the corresponding modularity scores, though.

Now let's suppose that weights actually correspond to connection strengths. Then the stronger the connection is, the fewer shortest paths will go through that edge (because we still need to compute them), the lower its edge betweenness is, and the less likely it is to be removed. So that kind of makes sense.

What is not nice, or rather strange, is that now the length of a path is defined as the sum of its connection strengths. However, we can reinterpret the algorithm then. Suppose that the weights are >> 1 within communities and << 1 between them. Then we can interpret the length of a path as the privacy of this path (e.g., a path within a community would contain lots of close interactions, while the edge connecting two communities is somewhat public, open). Given such an interpretation, the algorithm would look for the least private / the most open paths and compute the corresponding betweenness. Then we would be removing such edges that belong to many most open paths.

So perhaps I made a mistake somewhere, but it looks like it would make more sense to see weights as connection strengths.

Newman (2004) does something related:

...we will consider specifically those networks in which the weights on edges take greater values for vertex pairs that have closer connections or are more similar in some way.

It would seem that it should make sense. However, as to keep the more natural definition of the shortest path he writes:

One can define paths on a weighted network by assuming the “length” of an edge to vary inversely with its weight, so that two vertices that are connected twice as strongly will be half as far apart.

That is, the shortest path lengths now are inversely related to the weights. Since not doing that seemed to give good results, now we have a problem:

To see this, notice that any two vertices that are particularly strongly connected to one another will have a particularly short distance along the edge between them. Geodesic paths will thus, all other things being equal, prefer to flow along such an edge than along another longer edge between two less well connected vertices, and hence closely connected pairs will tend to attract a lot of paths and acquire high betweenness. This means that, as a general rule, we are more likely to remove edges between well connected pairs than we are between poorly connected pairs, and this is the precise opposite of what we would like the algorithm to do.

Which is the result that I described when we see weights as distances. As I mentioned in the beginning of the answer, to deal with this Newman (2004) proposes mapping weighted graphs to unweighted multigraphs and then proceeding very similarly as in the standard case. I believe that this multigraph idea can be implemented by setting weighted = NULL but having not a binary adjacency matrix (when defining a graph; see weighted in ?graph_from_adjacency_matrix).

Modularity

First of all, one can use modularity with weighted graphs, as Newman (2004) does, that's not a problem. In general, it's so not obvious how using weights affects using modularity as the way to choose the number of communities. I'll perhaps add some examples with R. It seems like there should be an improvement over the unweighted case, as Newman (2004) finds, when the interpretation is in line with the way algorithm works. Otherwise, I think the graph structure and the weights itself can be quite important to describe the degree of how far from the truth we get.

References

Newman, M.E., 2004. Analysis of weighted networks. Physical review E, 70(5).

Brandes, U., 2001. A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2), pp.163-177.

Source https://stackoverflow.com/questions/49151996

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install Community-Detection

You can download it from GitHub.
You can use Community-Detection like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

Aim to solve community detection problem via a semi-supervised and data driven algorithm: GCN. Take the network graph as input and predict which group each node belongs to.

Find more information at: