Visualizing PageRank using networkx, numpy and matplotlib in python

March 07, 2020

python algorithm graph

Today I wanted to understand how the PageRank algorithm works by visualizing the different iterations on a gif. But to make the exercise more complicated (interesting ;-)), I also wanted to implement my own PR algorithm using matrix formulation.

PageRank with matrices

Implementation

In terms of implementation, I decided to rely on the networkx representation of graphs and their methods such as adjacency_matrix. The graph is created using for instance:

import networkx as nx

G = nx.read_edgelist("test_graph.edgelist")

Then, we just need to iterate for a maxium of max_iter, or until the desired mean error is reached. At each step, the PageRank is updated with:

pr = d * weight.dot(pr) + (1-d)/N

Where the weight matrix is a NxN matrix whose ij element is the weight between node i and j (1/deg(j)).

The full code is reproduced here:

import numpy as np
def page_rank(G, d=0.85, tol=1e-2, max_iter=100):
    """Return the PageRank of the nodes in the graph. 

    :param dict G: the graph
    :param float d: the damping factor
    :param flat tol: tolerance to determine algorithm convergence
    :param int max_iter: max number of iterations
    """
    nodes = G.nodes()
    matrix = nx.adjacency_matrix(G, nodelist=nodes)
    out_degree = matrix.sum(axis=0)
    weight = matrix / out_degree
    N = G.number_of_nodes()
    pr = np.ones(N).reshape(N, 1) * 1./N

    # need to repeat the initial step twice
    # for matplotlib animation
    yield nodes, pr, "init"
    yield nodes, pr, "init"

    for it in range(max_iter):
        old_pr = pr[:]
        pr = d * weight.dot(pr) + (1-d)/N
        yield nodes, pr, it
        err = np.absolute(pr - old_pr).sum()
        if err < tol:
            return pr
    #raise Exception(f'PageRank failed after max iteration = {max_iter} (err={err} > tol = {tol})')

The resulting values for PageRank using this implementation are compatible with the ones obtained with the networkx implementation:

# Custom page rank:
[[0.05427205 0.14652879 0.05427205 0.08972923 0.12543861 0.1398845
  0.11906867 0.0530322  0.0530322  0.08237085 0.08237085]]
# network page rank:
{'1': 0.05472174724228272, '3': 0.14500947724655655, '2': 0.05472174724228272, '4': 0.09022185969480352, '5': 0.1252959204066461, '6': 0.13952052031260131, '9': 0.1192351424187364, '7': 0.05316762893415654, '8': 0.05316762893415654, '10': 0.0824691637838888, '11': 0.0824691637838888}

This function can then be used within a matplotlib animation to visualize the rank evolution until convergence.

Visualization

Since the function was implemented as a generator, yielding the current ranks at each iteration, we can directly use it in a FunctionAnimation. We just need an update function, that will actually draw the graph with node colors depending on the rank (the darker the blue, the highest the rank):

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation


def update(r):
    res_nodes, res_values, it = r
    res_values = np.asarray(res_values).ravel()
    plt_nodes = nx.draw_networkx_nodes(
        G, pos,
        ax=ax,
        nodelist=res_nodes,
        node_color=res_values,
        alpha=1,
        node_size=700,
        cmap=plt.cm.Blues,
        vmin=0,
        vmax=0.2
    )
    ax.axis("off")
    ax.set_title(f"Iteration {it}")
    nx.draw_networkx_edges(G, pos)
    nx.draw_networkx_labels(G, pos, font_size=14)
    return [plt_nodes, ]



G = nx.read_edgelist("test_graph.edgelist")
pos = nx.kamada_kawai_layout(G)

f, ax = plt.subplots()
ani = FuncAnimation(
    f,
    update,
    frames=page_rank(G),
    interval=1000,
    blit=True
)
f.suptitle(f"  Page Rank")
ani.save("graph_pr.gif", writer='imagemagick')

The result of this code is the following animated gif, where one can clearly see the rank distribution, especially during the first steps:

Losses for full model

Estelle Scifo

(Graph) Data Scientist & Python developer