244. An efficient local method for community detection in large networks
APPLIED, COND-MAT — By Symeon Papadopoulos on February 9, 2009 at 9:05 pmThis is a guest blog post by Symeon Papadopoulos (MKLab, Informatics and Telematics Institute, Greece) who is going to determine whether effective community exist in the complex network of NEQNET readers
Dmitry.
I have been kindly given the opportunity by Dmitry to provide an overview of the problem of community detection in networks and discuss a little bit a recent paper of mine, appearing in arxiv:0902.0871.
In short, community detection involves the analysis of network structure with the purpose of identifying groups of nodes that are more densely connected with each other than they are with the rest of the network nodes. Thus, the problem considers the representation of a complex system as a network (graph) consisting of nodes (the entities of the system under study) and edges (relations between the entities of the system). In practice, researchers employ community detection methods in order to study the structure of large networks (which due to their size cannot be visualized in a meaningful way). Establishing the existence of community structure in a network and subsequently identifying the communities embedded in it may provide useful insights into the behavior of the system that the network models and into its inherent organization.
Community detection methods have been applied in a wide range of scientific problems, e.g. metabolic networks (Ravasz et al., 2002), social networks (Girvan & Newman, 2002), citation networks (Rosvall & Bergstrom, 2008), the World Wide Web (Flake et al., 2000), and many others. For instance, in the case of citation networks, i.e. networks representing journal articles and their relations (citations), useful conclusions were drawn regarding the centrality and significance of scientific disciplines, as well as their inter-relations. In the case of the Web, community detection is even more important (and challenging), since it can be used to identify and manage (index and track) web page topics, which is inherently difficult due to the unmoderated nature of the Web.
A wide variety of techniques have been developed lately to address the problem of community detection, which is computationally very demanding. Since there is no universally accepted definition of what a community is, each method tackles the problem through a different strategy. Also, note that while the majority of reported methods are designed for undirected unweighted networks, several methods have been proposed for directed (Flake et al., 2000), weighted (Ino et al., 2005) and even for signed networks (Yang et al., 2007). In order to restrict the scope of this discussion, let us consider the following two community definitions (Radicchi et al., 2004), of which the second (weak) is frequently used (or implied) by community detection methods:
Strong definition: The subgraph V is a community in a strong sense if
, where
and
are the number of connections of node i that link to nodes within the subgraph (community) and outside the subgraph respectively.
Weak definition: The subgraph V is a community in a weak sense if
.
Among the numerous methods that have been proposed for the identification of weak communities, the most popular ones are perhaps the ones developed by professor Mark Newman, e.g. the shortest-path centrality (Girvan & Newman, 2002) or the modularity maximizing algorithm by (Clauset et al., 2004). Other methods of interest are based on the spectral properties of the adjacency matrix of the network (), use electrical circuits and the Kirchhoff laws to model the network (Wu & Huberman, 2004), or map the network to a spin system (Reichardt & Bornholdt, 2004). For a more thorough overview of available methods, have a look at the survey article by Fortunato and Castellano (2007).
In the paper I would like to discuss, a method is proposed that addresses the problem of community detection from a local perspective. A local perspective is often necessary in community detection, since there are problems that the complete graph may not be known (e.g. the Web), may be too large to fit in memory (e.g. a mobile phone call network), or we want to use the community detection method in an interactive way (i.e. we need it to run extremely fast). Similar to recent methods presented in (Clauset, 2005), (Luo et al. 2006) and (Bagrow, 2007), the proposed method (which we name Bridge Bounding) starts the community identification process from a seed node and gradually attaches neighboring nodes, i.e. nodes that are adjacent to the community boundary. The agglomeration of a candidate node to the community takes place only if the edge linking the node to the community is not a bridge. In that way, we map the problem of community detection to the problem of quantifying the extent to which an edge acts as a bridge between different communities. In the paper, we define local bridging functions, ranging from the simple first-order local bridging, based on the edge clustering coefficient (Radicchi et al., 2004) to higher-order local bridging functions. To make the concept of the proposed method clear, consider the following network which was generated from a synthetic process so as to contain four communities. Edges that connect nodes belonging to different communities with each other (inter-community edges) are drawn in dashed line.

Then, we calculate the edge local bridging property for all network edges by means of the following equation:
, s and t are the start and target node of the edge, N(s), N(t) are the sets of nodes that are adjacent to s and t respectively and
,
denote their degrees. If we plot the distribution of this edge property across all the edges of the above synthetic network, we get the following plot:

The above distribution reveals that when community structure is conspicuous, it is possible to distinguish between intra- and inter-community edges simply by comparing their edge local bridging against a threshold. In real networks, however, this is not the case, which renders the use of higher-order local bridging functions necessary.
We applied Bridge Bounding on a network comprising tags (as nodes) and their co-occurrences (as edges). The tag and co-occurrence data resulted from the usage of LYCOS iQ, a collaborative Question and Answer application operated by LYCOS Europe. Bridge Bounding led to the discovery of topically coherent communities around particular subjects. Below, we present two sample communities, one created by use of the tag “computer” as seed and another resulting from providing the tag “history” as seed.


We also employed an abstract visualization scheme for the tag communities in order to focus only on their structural properties. The following figure depicts the structure of four sample tag communities (using “music”, “science”, “film” and “animals” as seed nodes).

By inspecting the structure of these communities, one can draw conclusions about the level of topic sophistication in the particular system (LYCOS iQ). For instance, it appears that LYCOS iQ users asked only superficial questions related to “music” (the respective community is relatively sparse and star-shaped), while on the other hand they seem to ask more sophisticated and wide-ranging questions about “animals” (denser and more complex community structure).
This was just a simple case study illustrating how the application of a local community detection method can facilitate the exploration of a large information space (the tag network under study contained several thousands of tags and hundreds of thousands of inter-connections). One could envisage the integration of community detection methods, in applications that support the navigation in complex knowledge domain, e.g. patent classification schemes and term networks, scientific article concept spaces, etc.

Save This Post as PDF
1 Comment
Trackback responses to this post