For this assignment you will write a proxy server that services client requests for web pages originating from origin servers. In particular, you are being asked to write essentially a forward proxy server:
An ordinary forward proxy is an intermediate server that sits between the client and the origin server. In order to get content from the origin server, the client sends a request to the proxy naming the origin server as the target and the proxy then requests the content from the origin server and returns it to the client. The client must be specially configured to use the forward proxy to access other sites. [From Apache documentation]
Your proxy server operates as follows:
Requests are made to your proxy server as follows (for example, this document):
http://localhost:8080/www.people.westminstercollege.edu/faculty/ggagne/fall2019/cmpt352/hand outs/hw3/index.html
Your proxy server must be designed as follows:
Flow: see image.
After writing the response back to the client, the proxy server closes the socket connection to the client.
As an example, for the URL
http://localhost:8080/www.people.westminstercollege.edu/faculty/ggagne/fall2019/cmpt352/hand outs/hw3/index.html
your proxy server will first open a socket connection to the origin server on port 80. Followed by writing the following HTTP GET to the origin server:
GET /faculty/ggagne/fall2019/cmpt352/handouts/hw3/index.html HTTP/1.1
Host: www.people.westminstercollege.edu
Connection: close
(The newlines are part of the HTTP protocol and must be specified.)
Furthermore, your proxy server does not need to support persistent connections so you can specify that non-persistent connections can be used with the Connection: close command.
This assignment is largely based around parsing the requests the client browser sends to your proxy server so you can obtain the origin host and the resource that resides on the origin host that you will have to request.
Default Documents
There are some special circumstances you need to check when parsing. For example, if the following query is passed to your proxy server:
http://localhost:8080/www.amazon.com
where no resource appears after www.amazon.com, this means your request to the origin server is for the default document . In this instance, the HTTP command will be GET / where the / refers to requesting the default document.
When requesting the default document from an opigin server, the query your proxy server must construct appears as:
GET / HTTP/1.1
Host: amazon.com
Connection: close
Handling 404
Also be sure to test your proxy server so that it correctly handles 404 (resource not found) errors. (This should not require any special handling on your part, just make sure that it functions properly.)
The remaining work will simply involve reading what the origin server sends back to your proxy server, and writing that back to the client.
This assignment will involve reading and writing between two different socket connections:
It is suggested you use the following network API's for reading from and writing to sockets:
Using the Google Chrome browser, I will test your proxy server against the following URLs:
http://localhost:8080/www.people.westminstercollege.edu/facultyggagne/fall201
9/cmpt352/handouts/hw3/index.html
http://localhost:8080/www.amazon.com
http://localhost:8080/www.xkcd.com
http://localhost:8080/people.westminstercollege.edu/faculty/ggagne/fall2019/c
mpt352/chapters/chapter2/photos.html
In addition, I will check the following to ensure your proxy server correctly handles 404 errors:
http://localhost:8080/www.apple.com/windowsrules.html
(Again, since you are just handing requests and responses between the client and origin host, you shouldn't have to do anything special to manage handling a 404 error.)
More good news - your proxy server does not need to worry about invalid hosts (i.e. non-existent domain names) as this would require your server generating an HTTP response message back to the client and this is material that will be covered when you design your web server. You can assume that all origin hosts are legitimate IP names.