Create a program that will read in the text le containing graph specications and write to the output le the required results after performing operations on the graph. The program should be able to create the directed graph from the input le and output the results according to the In-Degree of Centrality.
Your program must be capable of utilizing a command line argument to specify the input graph le and output le with results.
sna inputFile outputFile
Your program must ensure the user has correctly provided the required command-line arguments and display a usage statement if the provided arguments are incorrect.
The assignment name for this assignment is: sna
As social networking is becoming more and more popular among people, there is a growing interest in the study of extracting information from these networks, the so-called social network analysis (SNA). The usage of social network analysis, however, is far beyond that of Facebook or Twitter. The principle of SNA can be applied to other things; such as, nding the source and ow of an infectious disease, and scheduling the optimal production-market distribution for a globalized company.
The basic structure of SNA is a graph. To evaluate the characteristics of a graph and the nodes inside, people gradually develop a set of metrics, some of them are:
In an undirected graph, the degree centrality of a node i (denoted by CD(i)) is the node degree (number of edges), denoted by deg(i).
CD(i) = deg(i)
We can interpret this as the how connected this node is in general sense. In a directed graph, the indegree of centrality comes in two types: the in-degree centrality and the out-degree centrality. The in-degree centrality is number of edges that are coming to the node. The out-degree centrality uses the number of edges that are coming out instead.
This made-up metric tells me who is popular among the root users friends. This is a list of unique users that are followed by the people the root user is following. Followability allows a social network to say, Here are some suggested accounts for you to follow.
You will be provided with real Twitter accounts with their friends and for each friend his/hers friends etc. By using this data build up a graph and calculate the In-Degree of Centrality for each user.
A sample of the input data is:
github john_stewart
github microsoft
microsoft oracle
The two columns are dening the relationship between two accounts. The user in the second column is following the user in the rst column. Thus, this is a directed graph.
For an empty le, or a le that fails formatting, you can quit with an error message that prints the usage statement or some other statementthe most important part is that you dont segfault.
You can also assume that the graph we give you will be connected; starting at any node, you will be able to visit all other nodes using a breadth-rst or depth-rst search, or a variant thereof.
The data is in the format of:
Name_1 Name_2
Name_x Name_y
For each line there will be two usernames seperated by three spaces. Each username is no longer than 15 characters and contain only alphanumeric characters(letters A-Z,a-z, numbers 0-9) with the exception of underscores.
Dene a user class representing each node in the graph. For this class there are some essential concepts that must be captured:
The created user class objects need to be stored properly. Decide a way to store this list of all users from input data sets; class denitions are up to you, but the fundamental structure will reect a directed graph with nodes and edges.
Calculating in-degree centrality is relatively easy. Once the in-degree centrality has been calculated for each user we want to nd the most connected user in the graph (root user). This should be the user with the highest in-degree of centrality. If there is a tie, break it with alphabetical order (case-insensitive). Once this user has been found we want to nd all of the users who are within a depth of 2 from the most connected user who are not already followed by this user, and sort them by their in-degree of centrality.
The program should output a list of all the users within a depth of 2 of the root user (the user with the greatest degree centraility), that are not already being followed by that user (root user). The list should be printed in descending order of the in-degree of centraility for each one of the accounts. The output written should be in the following format:
Looking for new accounts for Name (indegreeofcentrality) to follow
Name1 (degreeofcentrality)
Name2 (degreeofcentrality)
...
When there is a tie in in-degree of centrality the names of the users should be sorted alphabetically by name (case-insensitive).