社团检测之标签传播算法Python实现

Don’t you wonder sometimes, what might have happened if you tried?

有时别想那么多，试一试看看结果会怎么样？

LPA标签传播算法

主要优点：时间复杂度近似线性，不需要事先知道社区数量。

主要算法流程：首先为每个节点设置唯一标签，接着迭代依次更新各个节点，针对每个节点，通过统计节点邻居的标签，选择标签数最多的标签更新该节点，如果最多便签数大于一，则从中随机选择一个标签更新该节点，直到收敛为止。

标签传播算法的节点标签更新策略主要分成两种：一种是同步更新，另一种是异步更新。
同步更新：在执行第t次迭代更新时，仅依赖第t-1次更新后的标签集。
异步更新：在执行第t次迭代更新时，同时依赖t次迭代已经更新的标签集以及在t-1更新但t次迭代中未来的及更新的标签集，异步更新策略更关心节点更新顺序，所以在异步更新过程中，节点的更新顺序采用随机选取的方式。

LPA算法适用于非重叠社区发现，针对重叠社区的发现问题，学者提出了COPRA（Community Overlapping Propagation Algorithm）算法。该算法提出所有节点可以同时属于V个社区，V是个人为设定的全局变量，很显然 V 的选择直接影响算法的效果，针对V的选择需要足够的先验知识，在真实的社区网络中，V的选择不能很好的被控制。

Python实现过程

# -*- coding: UTF-8 -*-

"""
Created on 17-11-28

@summary: 实现传统标签传播算法LPA

@author: dreamhome
"""

import random
import networkx as nx
import matplotlib.pyplot as plt


def read_graph_from_file(path):
    """
    :param path: 从文件中读取图结构
    :return: Graph graph
    """
    # 定义图
    graph = nx.Graph()
    # 获取边列表edges_list
    edges_list = []
    # 开始获取边
    fp = open(path)
    edge = fp.readline().split()
    while edge:
        if edge[0].isdigit() and edge[1].isdigit():
            edges_list.append((int(edge[0]), int(edge[1])))
        edge = fp.readline().split()
    fp.close()
    # 为图增加边
    graph.add_edges_from(edges_list)

    # 给每个节点增加标签
    for node, data in graph.nodes_iter(True):
        data['label'] = node

    return graph


def lpa(graph):
    """
    标签传播算法 使用异步更新方式
    :param graph:
    :return:
    """
    def estimate_stop_condition():
        """
        算法终止条件：所有节点的标签与大部分邻居节点标签相同或者迭代次数超过指定值则停止
        :return:
        """
        for node in graph.nodes_iter():
            count = {}
            for neighbor in graph.neighbors_iter(node):
                neighbor_label = graph.node[neighbor]['label']
                count[neighbor_label] = count.setdefault(
                    neighbor_label, 0) + 1

            # 找到计数值最大的label
            count_items = count.items()
            count_items.sort(key=lambda x: x[1], reverse=True)
            labels = [k for k, v in count_items if v == count_items[0][1]]
            # 当节点标签与大部分邻居节点标签相同时则达到停止条件
            if graph.node[node]['label'] not in labels:
                return False

        return True

    loop_count = 0

    # 迭代标签传播过程
    while True:
        loop_count += 1
        print '迭代次数', loop_count

        for node in graph.nodes_iter():
            count = {}
            for neighbor in graph.neighbors_iter(node):
                neighbor_label = graph.node[neighbor]['label']
                count[neighbor_label] = count.setdefault(
                    neighbor_label, 0) + 1

            # 找到计数值最大的标签
            count_items = count.items()
            # print count_items
            count_items.sort(key=lambda x: x[1], reverse=True)
            labels = [(k, v) for k, v in count_items if v == count_items[0][1]]
            # 当多个标签最大计数值相同时随机选取一个标签
            label = random.sample(labels, 1)[0][0]
            graph.node[node]['label'] = label

        if estimate_stop_condition() is True or loop_count >= 10:
            print 'complete'
            return


if __name__ == "__main__":

    path = "/home/dreamhome/network-datasets/dolphins/out.dolphins"
    graph = read_graph_from_file(path)
    lpa(graph)

    # 根据算法结果画图
    node_color = [float(graph.node[v]['label']) for v in graph]
    nx.draw_networkx(graph, node_color=node_color)
    plt.show()

社团检测之标签传播算法Python实现

LPA标签传播算法

Python实现过程

猜你喜欢