在本实验中,您将了解Web代理服务器的工作原理及其基本功能之一 —— 缓存。

您的任务是开发一个能够缓存网页的小型Web代理服务器。这是一个很简单的代理服务器,它只能理解简单的GET请求,但能够处理各种对象 —— 不仅仅是HTML页面,还包括图片。

通常,当客户端发出一个请求时,请求将被直接发送到Web服务器。然后Web服务器处理该请求并将响应消息发送客户端。为了提高性能,我们在客户端和Web服务器之间建立一个代理服务器。现在,客户端发送的请求消息和Web服务器返回的响应消息都要经过代理服务器。换句话说,客户端通过代理服务器请求对象。代理服务器将客户端的请求转发到Web服务器。然后,Web服务器将生成响应消息并将其传递给代理服务器,代理服务器又将其发送给客户端。
♠ 套接字编程作业4:多线程Web代理服务器 - 图1

代码

您将在下面找到客户端的代码框架。 您需要完成代码框架。需要您填写代码的地方标有#Fill in start和#Fill in end。 每个地方都需要填写至少一行代码。

运行代理服务器

使用命令行模式运行您的代理服务器程序,然后从您的浏览器发送一个网页请求,将IP地址和端口号指向代理服务器。 例如:http://localhost:8888/www.google.com 为了在独立的计算机上使用浏览器和代理服务器, 因此,在运行代理服务器时,您需要将“localhost”更换为代理服务器的所在机器的IP地址。您还需要将“8888”替换您在代理服务程序中使用的端口号。

配置浏览器

您还可以直接配置您的Web浏览器以使用您的代理服务。 具体取决于您的浏览器。在Internet Explorer中,您可以在 工具 > Internet选项 > 连接选项卡 > LAN设置 中设置代理。 在Netscape(包括衍生浏览器,如Mozilla)中,您可以在 工具 > 选项 > 高级选项 > 网络选项 > 连接设置 中设置代理。 在这两种情况下你都需要给出代理服务器的地址和端口号。你首先要毫无问题地在同一台计算机上运行代理服务器和浏览器。这种方式下,使用代理服务器获取网页就只需提供页面的URL。 例如 http://www.google.com

要提交的内容

您需要提交提交完整的代理服务器代码和一张客户端屏幕截图,用于验证您是否 确实通过代理服务器获取了网页。

代理服务器的Python代码框架

  1. from socket import *
  2. import sys
  3. if len(sys.argv) <= 1:
  4. print 'Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server'
  5. sys.exit(2)
  6. # Create a server socket, bind it to a port and start listening
  7. tcpSerSock = socket(AF_INET, SOCK_STREAM)
  8. # Fill in start.
  9. # Fill in end.
  10. while 1:
  11. # Strat receiving data from the client
  12. print 'Ready to serve...'
  13. tcpCliSock, addr = tcpSerSock.accept()
  14. print 'Received a connection from:', addr
  15. message = # Fill in start. # Fill in end.
  16. print message
  17. # Extract the filename from the given message
  18. print message.split()[1]
  19. filename = message.split()[1].partition("/")[2]
  20. print filename
  21. fileExist = "false"
  22. filetouse = "/" + filename
  23. print filetouse
  24. try:
  25. # Check wether the file exist in the cache
  26. f = open(filetouse[1:], "r")
  27. outputdata = f.readlines()
  28. fileExist = "true"
  29. # ProxyServer finds a cache hit and generates a response message
  30. tcpCliSock.send("HTTP/1.0 200 OK\r\n")
  31. tcpCliSock.send("Content-Type:text/html\r\n")
  32. # Fill in start.
  33. # Fill in end.
  34. print 'Read from cache'
  35. # Error handling for file not found in cache
  36. except IOError:
  37. if fileExist == "false":
  38. # Create a socket on the proxyserver
  39. c = # Fill in start. # Fill in end.
  40. hostn = filename.replace("www.","",1)
  41. print hostn
  42. try:
  43. # Connect to the socket to port 80
  44. # Fill in start.
  45. # Fill in end.
  46. # Create a temporary file on this socket and ask port 80
  47. for the file requested by the client
  48. fileobj = c.makefile('r', 0)
  49. fileobj.write("GET "+"http://" + filename + " HTTP/1.0\n\n")
  50. # Read the response into buffer
  51. # Fill in start.
  52. # Fill in end.
  53. # Create a new file in the cache for the requested file.
  54. # Also send the response in the buffer to client socket and the corresponding file in the cache
  55. tmpFile = open("./" + filename,"wb")
  56. # Fill in start.
  57. # Fill in end.
  58. except:
  59. print "Illegal request"
  60. else:
  61. # HTTP response message for file not found
  62. # Fill in start.
  63. # Fill in end.
  64. # Close the client and the server sockets
  65. tcpCliSock.close()
  66. # Fill in start.
  67. # Fill in end.

可选练习

  1. 目前代理服务器不能处理错误。这可能会导致一些问题,当客户端请求一个不可用的对象时,由于“404 Not Found”响应通常没有响应正文,而代理服务器会假设有正文并尝试读取它。
  2. 当前代理服务器只支持HTTP GET方法。通过添加请求体来增加对POST的支持。
  3. 缓存:每当客户端发出特定请求时,典型的代理服务器会缓存网页。缓存的基本功能如下:当代理获得一个请求时,它将检查请求的对象是否已经在缓存中,如果是,则从缓存返回对象,从而不用联系服务器。如果对象未被缓存,则代理从服务器获取该对象,向客户端返回该对象,并缓存一个拷贝以备将来的请求。在实际环境下,代理服务器必须验证被缓存的响应是否仍然有效,并且它们能对客户端正确响应。您可以在RFC 2068中阅读有关缓存及其在HTTP中实现方式的更多细节。添加上述简单的缓存功能。您不需要实现任何替换或验证策略。然而您需要实现的是,将请求和响应写入磁盘(即缓存)并能从磁盘中获取它们,用于缓存被请求命中时。为此,您需要在代理中实现一些内部数据结构,以便跟踪哪些请求处于缓存中时,以及它们在磁盘上的位置。您也可以将此数据结构保存在内存中,因为没有必要关机之后持续保存这些数据。

    答案

    作业4答案

    ```python

    改为Python3格式

    from socket import * import sys import os

if len(sys.argv) <= 1: print(‘Usage : “python ProxyServer.py server_ip”\n[server_ip : It is the IP Address Of Proxy Server’) sys.exit(2)

Create a server socket, bind it to a port and start listening

tcpSerSock = socket(AF_INET, SOCK_STREAM) tcpSerPort = int(sys.argv[1]) tcpSerSock.bind((“”, tcpSerPort)) print(tcpSerPort) tcpSerSock.listen(10) while 1:

  1. # Strat receiving data from the client
  2. print('Ready to serve...')
  3. tcpCliSock, addr = tcpSerSock.accept()
  4. print('Received a connection from:', addr)
  5. message = tcpCliSock.recv(1024)
  6. message = message.decode()
  7. print("message:", message)
  8. if(message == ''):
  9. continue
  10. # Extract the filename from the given message
  11. print("message.split()[1]:", message.split()[1])
  12. filename = message.split()[1].partition("/")[2]
  13. print("filename:", filename)
  14. fileExist = "false"
  15. filetouse = "/" + filename
  16. print("filetouse:", filetouse)
  17. try:
  18. # Check wether the file exist in the cache
  19. f = open("WEB/" + filetouse[1:], "rb")
  20. outputdata = f.read()
  21. f.close()
  22. fileExist = "true"
  23. # ProxyServer finds a cache hit and generates a response message
  24. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  25. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  26. tcpCliSock.send(outputdata)
  27. print('Read from cache')
  28. # Error handling for file not found in cache
  29. except IOError:
  30. if fileExist == "false":
  31. # Create a socket on the proxyserver
  32. c = socket(AF_INET, SOCK_STREAM)
  33. hostn = filename.replace("www.","",1)
  34. print("hostn:", hostn)
  35. try:
  36. # Connect to the socket to port 80
  37. serverName = hostn.partition("/")[0]
  38. serverPort = 80
  39. print((serverName, serverPort))
  40. c.connect((serverName, serverPort))
  41. askFile = ''.join(filename.partition('/')[1:])
  42. print("askFile:", askFile)
  43. # Create a temporary file on this socket and ask port 80
  44. # for the file requested by the client
  45. fileobj = c.makefile('rwb', 0)
  46. fileobj.write("GET ".encode() + askFile.encode() + " HTTP/1.0\r\nHost: ".encode() + serverName.encode() + "\r\n\r\n".encode())
  47. # Read the response into buffer
  48. serverResponse = fileobj.read()
  49. if serverResponse.split()[0] != b'404':
  50. print('404')
  51. tcpCliSock.send("HTTP/1.1 404 Not Found\r\n\r\n".encode())
  52. tcpCliSock.close()
  53. continue
  54. # Create a new file in the cache for the requested file.
  55. # Also send the response in the buffer to client socket and the corresponding file in the cache
  56. filename = "WEB/" + filename
  57. filesplit = filename.split('/')
  58. for i in range(0, len(filesplit) - 1):
  59. if not os.path.exists("/".join(filesplit[0:i+1])):
  60. os.makedirs("/".join(filesplit[0:i+1]))
  61. tmpFile = open(filename, "wb")
  62. print(serverResponse)
  63. serverResponse = serverResponse.split(b'\r\n\r\n')[1]
  64. print(serverResponse)
  65. tmpFile.write(serverResponse)
  66. tmpFile.close()
  67. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  68. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  69. tcpCliSock.send(serverResponse)
  70. except:
  71. print("Illegal request")
  72. c.close()
  73. else:
  74. # HTTP response message for file not found
  75. print("NET ERROR")
  76. # Close the client and the server sockets
  77. tcpCliSock.close()

tcpSerSock.close()

  1. <a name="uX7Hu"></a>
  2. ## 可选练习1答案
  3. ```python
  4. #改为Python3格式
  5. from socket import *
  6. import sys
  7. import os
  8. if len(sys.argv) <= 1:
  9. print('Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server')
  10. sys.exit(2)
  11. # Create a server socket, bind it to a port and start listening
  12. tcpSerSock = socket(AF_INET, SOCK_STREAM)
  13. tcpSerPort = int(sys.argv[1])
  14. tcpSerSock.bind(("", tcpSerPort))
  15. print(tcpSerPort)
  16. tcpSerSock.listen(10)
  17. while 1:
  18. # Strat receiving data from the client
  19. print('Ready to serve...')
  20. tcpCliSock, addr = tcpSerSock.accept()
  21. print('Received a connection from:', addr)
  22. message = tcpCliSock.recv(1024)
  23. message = message.decode()
  24. print("message:", message)
  25. if(message == ''):
  26. continue
  27. # Extract the filename from the given message
  28. print("message.split()[1]:", message.split()[1])
  29. filename = message.split()[1].partition("/")[2]
  30. print("filename:", filename)
  31. fileExist = "false"
  32. filetouse = "/" + filename
  33. print("filetouse:", filetouse)
  34. try:
  35. # Check wether the file exist in the cache
  36. f = open("WEB/" + filetouse[1:], "rb")
  37. outputdata = f.read()
  38. f.close()
  39. fileExist = "true"
  40. # ProxyServer finds a cache hit and generates a response message
  41. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  42. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  43. tcpCliSock.send(outputdata)
  44. print('Read from cache')
  45. # Error handling for file not found in cache
  46. except IOError:
  47. if fileExist == "false":
  48. # Create a socket on the proxyserver
  49. c = socket(AF_INET, SOCK_STREAM)
  50. hostn = filename.replace("www.","",1)
  51. print("hostn:", hostn)
  52. try:
  53. # Connect to the socket to port 80
  54. serverName = hostn.partition("/")[0]
  55. serverPort = 80
  56. print((serverName, serverPort))
  57. c.connect((serverName, serverPort))
  58. askFile = ''.join(filename.partition('/')[1:])
  59. print("askFile:", askFile)
  60. # Create a temporary file on this socket and ask port 80
  61. # for the file requested by the client
  62. fileobj = c.makefile('rwb', 0)
  63. fileobj.write("GET ".encode() + askFile.encode() + " HTTP/1.0\r\nHost: ".encode() + serverName.encode() + "\r\n\r\n".encode())
  64. # Read the response into buffer
  65. serverResponse = fileobj.read()
  66. if serverResponse.split()[0] != b'404':
  67. print('404')
  68. tcpCliSock.send("HTTP/1.1 404 Not Found\r\n\r\n".encode())
  69. tcpCliSock.close()
  70. continue
  71. # Create a new file in the cache for the requested file.
  72. # Also send the response in the buffer to client socket and the corresponding file in the cache
  73. filename = "WEB/" + filename
  74. filesplit = filename.split('/')
  75. for i in range(0, len(filesplit) - 1):
  76. if not os.path.exists("/".join(filesplit[0:i+1])):
  77. os.makedirs("/".join(filesplit[0:i+1]))
  78. tmpFile = open(filename, "wb")
  79. print(serverResponse)
  80. serverResponse = serverResponse.split(b'\r\n\r\n')[1]
  81. print(serverResponse)
  82. tmpFile.write(serverResponse)
  83. tmpFile.close()
  84. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  85. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  86. tcpCliSock.send(serverResponse)
  87. except:
  88. print("Illegal request")
  89. c.close()
  90. else:
  91. # HTTP response message for file not found
  92. print("NET ERROR")
  93. # Close the client and the server sockets
  94. tcpCliSock.close()
  95. tcpSerSock.close()

可选练习2答案

  1. # 找不到网站测试
  2. #改为Python3格式
  3. from socket import *
  4. import sys
  5. import os
  6. if len(sys.argv) <= 1:
  7. print('Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server')
  8. sys.exit(2)
  9. # Create a server socket, bind it to a port and start listening
  10. tcpSerSock = socket(AF_INET, SOCK_STREAM)
  11. tcpSerPort = int(sys.argv[1])
  12. tcpSerSock.bind(("", tcpSerPort))
  13. print(tcpSerPort)
  14. tcpSerSock.listen(10)
  15. while 1:
  16. # Strat receiving data from the client
  17. print('Ready to serve...')
  18. tcpCliSock, addr = tcpSerSock.accept()
  19. print('Received a connection from:', addr)
  20. message = tcpCliSock.recv(1024)
  21. message = message.decode()
  22. print("message:", message)
  23. if(message == ''):
  24. continue
  25. # Extract the filename from the given message
  26. print("message.split()[1]:", message.split()[1])
  27. filename = message.split()[1].partition("/")[2]
  28. print("filename:", filename)
  29. fileExist = "false"
  30. filetouse = "/" + filename
  31. print("filetouse:", filetouse)
  32. try:
  33. # Check wether the file exist in the cache
  34. f = open("WEB/" + filetouse[1:], "rb")
  35. outputdata = f.read()
  36. f.close()
  37. fileExist = "true"
  38. # ProxyServer finds a cache hit and generates a response message
  39. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  40. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  41. tcpCliSock.send(outputdata)
  42. print('Read from cache')
  43. # Error handling for file not found in cache
  44. except IOError:
  45. if fileExist == "false":
  46. # Create a socket on the proxyserver
  47. c = socket(AF_INET, SOCK_STREAM)
  48. hostn = filename.replace("www.","",1)
  49. print("hostn:", hostn)
  50. try:
  51. # Connect to the socket to port 80
  52. serverName = hostn.partition("/")[0]
  53. serverPort = 80
  54. print((serverName, serverPort))
  55. c.connect((serverName, serverPort))
  56. askFile = ''.join(filename.partition('/')[1:])
  57. print("askFile:", askFile)
  58. # Create a temporary file on this socket and ask port 80
  59. # for the file requested by the client
  60. fileobj = c.makefile('rwb', 0)
  61. if(message.split()[0] == 'GET'):
  62. fileobj.write("GET ".encode() + askFile.encode() + " HTTP/1.0\r\nHost: ".encode() + serverName.encode() + "\r\n\r\n".encode())
  63. else: #POST
  64. fileobj.write(
  65. "POST ".encode() + askFile.encode() + " HTTP/1.0\r\nHost: ".encode() + serverName.encode() + "\r\n\r\n".encode())
  66. fileobj.write(message.split("\r\n\r\n")[1].encode())
  67. # Read the response into buffer
  68. serverResponse = fileobj.read()
  69. if serverResponse.split()[0] != b'404':
  70. print('404')
  71. tcpCliSock.send("HTTP/1.1 404 Not Found\r\n\r\n".encode())
  72. tcpCliSock.close()
  73. continue
  74. # Create a new file in the cache for the requested file.
  75. # Also send the response in the buffer to client socket and the corresponding file in the cache
  76. filename = "WEB/" + filename
  77. filesplit = filename.split('/')
  78. for i in range(0, len(filesplit) - 1):
  79. if not os.path.exists("/".join(filesplit[0:i+1])):
  80. os.makedirs("/".join(filesplit[0:i+1]))
  81. tmpFile = open(filename, "wb")
  82. print(serverResponse)
  83. serverResponse = serverResponse.split(b'\r\n\r\n')[1]
  84. print(serverResponse)
  85. tmpFile.write(serverResponse)
  86. tmpFile.close()
  87. tcpCliSock.send("HTTP/1.1 200 OK\r\n".encode())
  88. tcpCliSock.send("Content-Type:text/html\r\n\r\n".encode())
  89. tcpCliSock.send(serverResponse)
  90. except:
  91. print("Illegal request")
  92. c.close()
  93. else:
  94. # HTTP response message for file not found
  95. print("NET ERROR")
  96. # Close the client and the server sockets
  97. tcpCliSock.close()
  98. tcpSerSock.close()