The objective of this programming assignment is to apply the PageRank algorithm on a very large graph representing a social network. This algorithm will allow you to identify the most influential person in this network.
The PageRank algorithm has been designed in 1998 by Larry Page, Sergey Brin, Rajeev Motwani and Terry Winograd. This algorithm is used to compute the level of importance of a web page; it is based on the idea that the more a page is referenced by important pages, the more this one is important. This is the algorithm that leads to the creation of Google and it is still at the core of its research engine.
The algorithm computes the PageRank (PR) of a node A in a directed graph using a recursive definition:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Here d is a damping factor that we will set to 0.85. Nodes T1, T2, , Tn are the nodes that connect to A (i.e. having a link going from Ti to A). The term C(Ti) is the number of links outgoing from Ti. The idea is that a node receives its PR from the node connected to it and it distributes its PR to the node it connects to.
Since the computation of the PR of a node requires the knowledge of the PR of its neighboring nodes, the full computation of all PRs has to be done in an iterative way. You first set the PR of all nodes to 1.0. You then sequentially update the PR of each node using the current PR value of its neighbors. You have to repeat this process several times before the PRs stabilize to their true value. The higher the PR of a node is, the more influence (or importance) this node has in the network.
You are given a file containing the description of the network to be analyzed. You also have the source code reading this file and extracting the information about the nodes and their adjacency lists.
You have to identify the 10 most influential nodes of this network (based on their PR value).