Gnutella I arguably represents the simplest and most generic type of unstructured Peer-to-Peer networks. This type is characterised by full distribution of the network resources and it is highly fault tolerant in the presence of network churn. Its main disadvantage though lies in its search process which is based on the concept of flooding. Its neighbourhood formation mechanism also significantly increases the network traffic as it is based on the simple but inefficient process of frequently exchanging ping pong messages. Different methodologies have been tried in an attempt to improve Gnutella’s overall efficiency but very few have looked into the issue of doing this in the context of Web Services discovery.
As discussed earlier in the module, P2P networks can be used as a platform for deploying distributed Web Services discovery mechanisms as opposed to the standard centralised method of using UDDI. The Gnutella I (version 0.4) protocol is a candidate approach for achieving this although it does have a number of potential limitations.
The aim of this project is threefold: Firstly, to provide a critical overview of Gnutella I in the context of Web Services discovery. Secondly to measure the performance of Gnutella I by developing a set of simplified simulations and thirdly to propose methods for potentially improving the performance as measured in the previous step.
The project consists of 3 tasks you must perform. Two of these are theoretical and one is practical. Their specification is as follows:
Task 1 25% of project effort Provide a brief overview of the Gnutella I protocol principles and functionality as described in the provided specification document. Identify and explain TWO advantages and TWO disadvantages in using Gnutella I for Web Services discovery.
Task 2 50% of project effort Use the programming language of your choice to develop a simple Gnutella I simulator. Your simulator should incorporate only the Query and QueryHit messages (so you don’t have to implement the Ping Pong and Push Gnutella descriptors). Furthermore, you should assume that any node that has services that match an incoming query will NOT forward this query to its neighbours even if the remaining TTL >0. On the module’s Blackboard site you will find: a) the text file “Network.txt” that encodes the actual network you should simulate and b) the text file “Resources.txt” that shows the distribution of Web Services on the network’s nodes. If you study the resource file carefully you will notice that there are 10 different types of Web Services; each one being represented by a number from 1 to 10 respectively. The services with ID between 1 and 6 inclusive are fairly evenly spread in the network. Services with ID 7 and 8 inclusive are very popular and therefore the vast majority of the network nodes have a Web Service of that type. Services with ID 9-10 inclusive are rare and only few nodes have services of this type. Your simulator should use the data from these files in order to setup the project’s required Gnutella topology. You should then use your simulator to carry out the following experiment:
Generate and process sequentially 10000 random queries each having a TTL=3. A query can be represented as a pair (originator node ID, target Web Services name). A randomly generated query has BOTH of its fields generated randomly. For each query your simulator should keep the following information:
At the end of your experiment your simulator should save the above values into a suitably formatted text file which you can then import to Microsoft Excel in order to draw the following graphs/bar charts:
Use your knowledge of the Gnutella protocol specification and the underlying project network you setup to explain and justify the above graphs. Finally repeat the above experiment, graphs and analysis by using TTL=4. How and why have the results changed compared to those of TTL=3?
Task 3 25% of project effort The Gnutella simulator you built and used for task 2 is quite simplistic as it does not include the implementation of the ping pong message exchange that takes place in a real Gnutella network. As mentioned in the introduction, the ping pong process actually introduces very significant data traffic on top of the query-related traffic.
Your task is to come up and describe a revised mechanism that can achieve the ping pong message exchange objectives but with less network traffic (number of messages). You should describe your suggested method in some detail and provide logical arguments as to why it could perform better than the ping pong exchange. However, you are NOT expected to carry out any experiments or simulations to prove your claims.