关于HTTP

Introduction

简介

The World Wide Web is a major distributed system, with millions of users. A site may become a Web host by running an HTTP server. While Web clients are typically users with a browser, there are many other "user agents" such as web spiders, web application clients and so on.

万维网是一个庞大的,拥有数以百万计用户的分布式系统。网站就是一个运行着HTTP服务器的Web主机。而Web客户端通常是浏览器用户,当然也还有许多其他的“用户”,如网络蜘蛛,Web应用程序客户端等。

The Web is built on top of the HTTP (Hyper-Text Transport Protocol) which is layered on top of TCP. HTTP has been through three publically available versions, but the latest - version 1.1 - is now the most commonly used.

Web使用的HTTP(超文本传输协议)是基于TCP协议的。HTTP有三个公开可用的版本,目前最常用的是最新的版本1.1。

In this chapter we give an overview of HTTP, followed by the Go APIs to manage HTTP connections.

本章首先对HTTP进行概述,然后介绍如何通过Go API管理HTTP连接。

Overview of HTTP

HTTP概述

URLs and resources

URL和资源

URLs specify the location of a resource. A resource is often a static file, such as an HTML document, an image, or a sound file. But increasingly, it may be a dynamically generated object, perhaps based on information stored in a database.

URL指定资源的位置。资源通常是HTML文档、图片、声音文件这样的静态文件,但越来越多的资源是动态生成的对象,比如根据数据库信息生成。

When a user agent requests a resource, what is returned is not the resource itself, but some representation of that resource. For example, if the resource is a static file, then what is sent to the user agent is a copy of the file.

“用户”请求资源时,返回的并不是资源本身,而是资源的代表。如果资源是静态文件,那么返回给用户的就是文件的一个副本。

Multiple URLs may point to the same resource, and an HTTP server will return appropriate representations of the resource for each URL. For example, an company might make product information available both internally and externally using different URLs for the same product. The internal representation of the product might include information such as internal contact officers for the product, while the external representation might include the location of stores selling the product.

不同的URL可以指向相同的资源,HTTP服务器会给每个URL返回适当的代表。例如,针对同一个产品,某公司可以使用不同的URL给本地和外地的用户查看其产品信息,本地用户可以看到本地产品联系人这类内容,而外地用户看到的内容则包括产品销售门店的地址等等。

This view of resources means that the HTTP protocol can be fairly simple and straightforward, while an HTTP server can be arbitrarily complex. HTTP has to deliver requests from user agents to servers and return a byte stream, while a server might have to do any amount of processing of the request.

这其实就意味着,HTTP协议本身非常简单直接,但HTTP服务器却可能非常复杂。HTTP将用户请求发送到服务器,并返回字节流,而服务器针对该请求可能需要做很多很多处理。

HTTP characteristics

HTTP的特点

HTTP is a stateless, connectionless, reliable protocol. In the simplest form, each request from a user agent is handled reliably and then the connection is broken. Each request involves a separate TCP connection, so if many reources are required (such as images embedded in an HTML page) then many TCP connections have to be set up and torn down in a short space of time.

HTTP协议是无状态,面向连接和可靠的。最简单的形式是,每个从用户发起的请求被可靠地处理,然后断开连接。每次请求都包括一个独立的TCP连接,所以如果要请求很多资源(如在HTML页面中嵌入的图像),则必须在很短的时间内建立并断开许多TCP连接。

Thera are many optimisations in HTTP which add complexity to the simple structure, in order to create a more efficient and reliable protocol.

为构建更高效更可靠的协议,有许多在这种简单结构基础上添加复杂性的优化技术。

Versions

版本

There are 3 versions of HTTP

HTTP有三个版本

  • Version 0.9 - totally obsolete
  • Version 1.0 - almost obsolete
  • Version 1.1 - current
  • Version 0.9 - 完全废弃
  • Version 1.0 - 基本废弃
  • Version 1.1 - 当前版本

Each version must understand requests and responses of earlier versions.

每个版本必须兼容早期的版本。

HTTP 0.9

Request format

请求格式

  1. Request = Simple-Request
  2.  
  3. Simple-Request = "GET" SP Request-URI CRLF

Response format

响应格式

A response is of the form

响应形式类似:

  1. Response = Simple-Response
  2.  
  3. Simple-Response = [Entity-Body]

HTTP 1.0

This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.

该版本在请求和响应中增加了很多信息。与其说是0.9的升级版,还不如说它是一个全新的版本。

Request format

请求格式

The format of requests from client to server is

从客户端到服务器端的请求格式:

  1. Request = Simple-Request | Full-Request
  2.  
  3. Simple-Request = "GET" SP Request-URI CRLF
  4.  
  5. Full-Request = Request-Line
  6. *(General-Header
  7. | Request-Header
  8. | Entity-Header)
  9. CRLF
  10. [Entity-Body]

A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

简单请求(Simple-Request)表明是一个 HTTP/0.9 请求,必须回复简单响应(Simple-Response)。

A Request-Line has format

请求行(Request-Line)的格式如下:

  1. Request-Line = Method SP Request-URI SP HTTP-Version CRLF

where

其中

  1. Method = "GET" | "HEAD" | POST |
  2. extension-method

e.g.

如:

  1. GET http://jan.newmarch.name/index.html HTTP/1.0

Response format

响应格式

A response is of the form

响应的形式如下:

  1. Response = Simple-Response | Full-Response
  2.  
  3. Simple-Response = [Entity-Body]
  4.  
  5. Full-Response = Status-Line
  6. *(General-Header
  7. | Response-Header
  8. | Entity-Header)
  9. CRLF
  10. [Entity-Body]

The Status-Line gives information about the fate of the request:

状态行(Status-Line)会给出请求的最后的状态信息:

  1. Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

e.g.

  1. HTTP/1.0 200 OK

The codes are

状态码:

  1. Status-Code = "200" ; OK
  2. | "201" ; Created
  3. | "202" ; Accepted
  4. | "204" ; No Content
  5. | "301" ; Moved permanently
  6. | "302" ; Moved temporarily
  7. | "304" ; Not modified
  8. | "400" ; Bad request
  9. | "401" ; Unauthorised
  10. | "403" ; Forbidden
  11. | "404" ; Not found
  12. | "500" ; Internal server error
  13. | "501" ; Not implemented
  14. | "502" ; Bad gateway
  15. | "503" | Service unavailable
  16. | extension-code

The Entity-Header contains useful information about the Entity-Body to follow

实体头(Entity-Header)包含了有关实体(Entity-Body)的有用信息

  1. Entity-Header = Allow
  2. | Content-Encoding
  3. | Content-Length
  4. | Content-Type
  5. | Expires
  6. | Last-Modified
  7. | extension-header

For example

例如:

  1. HTTP/1.1 200 OK
  2. Date: Fri, 29 Aug 2003 00:59:56 GMT
  3. Server: Apache/2.0.40 (Unix)
  4. Accept-Ranges: bytes
  5. Content-Length: 1595
  6. Connection: close
  7. Content-Type: text/html; charset=ISO-8859-1

HTTP 1.1

HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP 1.0. e.g.

HTTP 1.1 修复了 HTTP 1.0 中的很多问题,因此更加复杂。例如此版本中扩展和完善了HTTP 1.0中的可选项。

  • there are more commands such as TRACE and CONNECT
  • you should use absolute URLs, particularly for connecting by proxies e.g
  1. GET http://www.w3.org/index.html HTTP/1.1
  • there are more attributes such as If-Modified-Since, also for use by proxies
  • 增加了命令,如 TRACE 和 CONNECT
  • 注意在通过代理服务器进行连接时,应当使用绝对路径。如:
  1. GET http://www.w3.org/index.html HTTP/1.1
  • 增加了更多属性,例如针对代理服务器的If-Modified-Since。

The changes include

这些变动包括:

  • hostname identification (allows virtual hosts)
  • content negotiation (multiple languages)
  • persistent connections (reduces TCP overheads - this is very messy)
  • chunked transfers
  • byte ranges (request parts of documents)
  • proxy support
  • 主机名识别(支持虚拟主机)
  • 内容协商(多语言)
  • 持久连接(降低TCP开销)
  • 分块传送
  • 字节范围(请求文件部分内容)
  • 代理支持

The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages.

0.9版本的协议只有一页,1.0版本用了大约20页来说明,而1.1则用了120页。

Simple user-agents

简单用户代理(Simple user-agents)

User agents such as browsers make requests and get responses. The response type is

用户代理(User agent)(例如浏览器)用来发起请求和接收响应。代码中的 response type 如下:

  1. type Response struct {
  2. Status string // e.g. "200 OK"
  3. StatusCode int // e.g. 200
  4. Proto string // e.g. "HTTP/1.0"
  5. ProtoMajor int // e.g. 1
  6. ProtoMinor int // e.g. 0
  7. RequestMethod string // e.g. "HEAD", "CONNECT", "GET", etc.
  8. Header map[string]string
  9. Body io.ReadCloser
  10. ContentLength int64
  11. TransferEncoding []string
  12. Close bool
  13. Trailer map[string]string
  14. }

We shall examine this data structure through examples. The simplest request is from a user agent is "HEAD" which asks for information about a resource and its HTTP server. The function

通过实例可以了解其数据结构。最简单的请求是由用户代理发起"HEAD"命令,其中包括请求的资源和HTTP服务器。函数

  1. func Head(url string) (r *Response, err os.Error)

can be used to make this query.

可用来发起此请求。

The status of the response is in the response field Status, while the field Header is a map of the header fields in the HTTP response. A program to make this request and display the results is

响应状态对应response中的Status属性,而Header属性对应HTTP响应的header域。下面的程序用来发起请求和显示结果:

  1. /* Head
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "net/http"
  7. "os"
  8. )
  9. func main() {
  10. if len(os.Args) != 2 {
  11. fmt.Println("Usage: ", os.Args[0], "host:port")
  12. os.Exit(1)
  13. }
  14. url := os.Args[1]
  15. response, err := http.Head(url)
  16. if err != nil {
  17. fmt.Println(err.Error())
  18. os.Exit(2)
  19. }
  20. fmt.Println(response.Status)
  21. for k, v := range response.Header {
  22. fmt.Println(k+":", v)
  23. }
  24. os.Exit(0)
  25. }

When run against a resource as in Head http://www.golang.com/ it prints something like

程序运行请求资源,Head http://www.golang.com/,输出结果类似:

  1. 200 OK
  2. Content-Type: text/html; charset=utf-8
  3. Date: Tue, 14 Sep 2010 05:34:29 GMT
  4. Cache-Control: public, max-age=3600
  5. Expires: Tue, 14 Sep 2010 06:34:29 GMT
  6. Server: Google Frontend

Usually, we are want to retrieve a resource rather than just get information about it. The "GET" request will do this, and this can be done using

通常我们希望接收到一个资源内容而不是其有关信息。"GET"请求就是做来做这个的,使用如下函数即可:

  1. func Get(url string) (r *Response, finalURL string, err os.Error)

The content of the response is in the response field Body which is of type io.ReadCloser. We can print the content to the screen with the following program

响应内容为response的Body属性。它是一个io.ReadCloser类型。我们可以用以下程序在屏幕上打印相应内容

  1. /* Get
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "net/http"
  7. "net/http/httputil"
  8. "os"
  9. "strings"
  10. )
  11. func main() {
  12. if len(os.Args) != 2 {
  13. fmt.Println("Usage: ", os.Args[0], "host:port")
  14. os.Exit(1)
  15. }
  16. url := os.Args[1]
  17. response, err := http.Get(url)
  18. if err != nil {
  19. fmt.Println(err.Error())
  20. os.Exit(2)
  21. }
  22. if response.Status != "200 OK" {
  23. fmt.Println(response.Status)
  24. os.Exit(2)
  25. }
  26. b, _ := httputil.DumpResponse(response, false)
  27. fmt.Print(string(b))
  28. contentTypes := response.Header["Content-Type"]
  29. if !acceptableCharset(contentTypes) {
  30. fmt.Println("Cannot handle", contentTypes)
  31. os.Exit(4)
  32. }
  33. var buf [512]byte
  34. reader := response.Body
  35. for {
  36. n, err := reader.Read(buf[0:])
  37. if err != nil {
  38. os.Exit(0)
  39. }
  40. fmt.Print(string(buf[0:n]))
  41. }
  42. os.Exit(0)
  43. }
  44. func acceptableCharset(contentTypes []string) bool {
  45. // each type is like [text/html; charset=UTF-8]
  46. // we want the UTF-8 only
  47. for _, cType := range contentTypes {
  48. if strings.Index(cType, "UTF-8") != -1 {
  49. return true
  50. }
  51. }
  52. return false
  53. }

Note that there are important character set issues of the type discussed in the previous chapter. The server will deliver the content using some character set encoding, and possibly some transfer encoding. Usually this is a matter of negotiation between user agent and server, but the simple Get command that we are using does not include the user agent component of the negotiation. So the server can send whatever character encoding it wishes.

注意这里有一个重要的字符集类型问题,在前面章节也讨论过。服务器提供内容时使用的字符集编码,甚至传输编码,通常是用户代理和服务器之间协商的那结果,但我们使用的Get的命令很简单,它不包括用户代理的内容协商组件。因此,服务器可以自行决定使用什么字符编码。

At the time of first writing, I was in China. When I tried this program on www.google.com, Google's server tried to be helpful by guessing my location and sending me the text in the Chinese character set Big5! How to tell the server what character encoding is okay for me is discussed later.

我第一次写的时候是在中国。当我用这个程序访问www.google.com时,谷歌的服务器尝试猜测我的地理位置,然后很厉害地使用了Big5码给我发送文本!后面会讨论如何告知服务器给我什么字符编码最好。

Configuring HTTP requests

设置HTTP请求

Go also supplies a lower-level interface for user agents to communicate with HTTP servers. As you might expect, not only does it give you more control over the client requests, but requires you to spend more effort in building the requests. However, there is only a small increase.

Go还提供一个较低级别的用户代理接口用来与HTTP服务器进行通信。你可能已经想到,这样可以更灵活地控制客户端请求,当然创建请求也会更费力气。不过这只需要多费一点点力气。

The data type used to build requests is the type Request. This is a complex type, and is given in the Go documentation as

用来创建请求的数据类型是Request。这是个复杂的类型,Go语言文档中给出的定义如下:

  1. type Request struct {
  2. Method string // GET, POST, PUT, etc.
  3. RawURL string // The raw URL given in the request.
  4. URL *URL // Parsed URL.
  5. Proto string // "HTTP/1.0"
  6. ProtoMajor int // 1
  7. ProtoMinor int // 0
  8. // A header maps request lines to their values.
  9. // If the header says
  10. //
  11. // accept-encoding: gzip, deflate
  12. // Accept-Language: en-us
  13. // Connection: keep-alive
  14. //
  15. // then
  16. //
  17. // Header = map[string]string{
  18. // "Accept-Encoding": "gzip, deflate",
  19. // "Accept-Language": "en-us",
  20. // "Connection": "keep-alive",
  21. // }
  22. //
  23. // HTTP defines that header names are case-insensitive.
  24. // The request parser implements this by canonicalizing the
  25. // name, making the first character and any characters
  26. // following a hyphen uppercase and the rest lowercase.
  27. Header map[string]string
  28. // The message body.
  29. Body io.ReadCloser
  30. // ContentLength records the length of the associated content.
  31. // The value -1 indicates that the length is unknown.
  32. // Values >= 0 indicate that the given number of bytes may be read from Body.
  33. ContentLength int64
  34. // TransferEncoding lists the transfer encodings from outermost to innermost.
  35. // An empty list denotes the "identity" encoding.
  36. TransferEncoding []string
  37. // Whether to close the connection after replying to this request.
  38. Close bool
  39. // The host on which the URL is sought.
  40. // Per RFC 2616, this is either the value of the Host: header
  41. // or the host name given in the URL itself.
  42. Host string
  43. // The referring URL, if sent in the request.
  44. //
  45. // Referer is misspelled as in the request itself,
  46. // a mistake from the earliest days of HTTP.
  47. // This value can also be fetched from the Header map
  48. // as Header["Referer"]; the benefit of making it
  49. // available as a structure field is that the compiler
  50. // can diagnose programs that use the alternate
  51. // (correct English) spelling req.Referrer but cannot
  52. // diagnose programs that use Header["Referrer"].
  53. Referer string
  54. // The User-Agent: header string, if sent in the request.
  55. UserAgent string
  56. // The parsed form. Only available after ParseForm is called.
  57. Form map[string][]string
  58. // Trailer maps trailer keys to values. Like for Header, if the
  59. // response has multiple trailer lines with the same key, they will be
  60. // concatenated, delimited by commas.
  61. Trailer map[string]string
  62. }

There is a lot of information that can be stored in a request. You do not need to fill in all fields, only those of interest. The simplest way to create a request with default values is by for example

请求中可以存放大量的信息,但你不需要填写所有的内容,只填必要的即可。最简单的使用默认值创建请求的方法如下:

  1. request, err := http.NewRequest("GET", url.String(), nil)

Once a request has been created, you can modify fields. For example, to specify that you only wish to receive UTF-8, add an "Accept-Charset" field to a request by

请求创建后,可以修改其内容字段(field)。比如,需指定只接受UTF-8,可添加一个"Accept-Charset"字段:

  1. request.Header.Add("Accept-Charset", "UTF-8;q=1, ISO-8859-1;q=0")

(Note that the default set ISO-8859-1 always gets a value of one unless mentioned explicitly in the list.).

(注意,若没有在列表中提及,则默认设置ISO-8859-1总是返回值1).

A client setting a charset request is simple by the above. But there is some confusion about what happens with the server's return value of a charset. The returned resource should have a Content-Type which will specify the media type of the content such as text/html. If appropriate the media type should state the charset, such as text/html; charset=UTF-8. If there is no charset specification, then according to the HTTP specification it should be treated as the default ISO8859-1 charset. But the HTML 4 specification states that since many servers don't conform to this, then you can't make any assumptions.

如上所述,客户端设置字符集请求很简单。但对于服务器返回的字符集,发生的事情就比较复杂。返回的资源理应包含Content-Type,用来指明内容的媒介类型,如:text/html。有些媒介类型应当声明字符集,如text/html; charset=UTF-8。如果没有指明字符集,按照HTTP规范就应当作为默认的ISO8859-1字符集处理。但是很多服务器并不符合此约定,因此HTML 4规定此时不能做任何假设。

If there is a charset specified in the server's Content-Type, then assume it is correct. if there is none specified, since 50% of pages are in UTF-8 and 20% are in ASCII then it is safe to assume UTF-8. Only 30% of pages may be wrong :-(.

如果服务器的Content-Type指定了字符集,那么就认为它是正确的。如果未指定字符集,由于50%的页面是UTF-8的,20%的页面是ASCII的,因此假设字符集是UTF-8的会比较安全,但仍然有30%的页面可能会出错:-(。

The Client object

客户端对象

To send a request to a server and get a reply, the convenience object Client is the easiest way. This object can manage multiple requests and will look after issues such as whether the server keeps the TCP connection alive, and so on.

向服务器发送一个请求并取得回复,最简单的方法是使用方便对象Client。此对象可以管理多个请求,并处理一些问题,如与服务器间的TCP连接是否保持活动状态等。

This is illustrated in the following program

下面的程序给出了示例:

  1. /* ClientGet
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "net/http"
  7. "net/url"
  8. "os"
  9. "strings"
  10. )
  11. func main() {
  12. if len(os.Args) != 2 {
  13. fmt.Println("Usage: ", os.Args[0], "http://host:port/page")
  14. os.Exit(1)
  15. }
  16. url, err := url.Parse(os.Args[1])
  17. checkError(err)
  18. client := &http.Client{}
  19. request, err := http.NewRequest("GET", url.String(), nil)
  20. // only accept UTF-8
  21. request.Header.Add("Accept-Charset", "UTF-8;q=1, ISO-8859-1;q=0")
  22. checkError(err)
  23. response, err := client.Do(request)
  24. if response.Status != "200 OK" {
  25. fmt.Println(response.Status)
  26. os.Exit(2)
  27. }
  28. chSet := getCharset(response)
  29. fmt.Printf("got charset %s\n", chSet)
  30. if chSet != "UTF-8" {
  31. fmt.Println("Cannot handle", chSet)
  32. os.Exit(4)
  33. }
  34. var buf [512]byte
  35. reader := response.Body
  36. fmt.Println("got body")
  37. for {
  38. n, err := reader.Read(buf[0:])
  39. if err != nil {
  40. os.Exit(0)
  41. }
  42. fmt.Print(string(buf[0:n]))
  43. }
  44. os.Exit(0)
  45. }
  46. func getCharset(response *http.Response) string {
  47. contentType := response.Header.Get("Content-Type")
  48. if contentType == "" {
  49. // guess
  50. return "UTF-8"
  51. }
  52. idx := strings.Index(contentType, "charset:")
  53. if idx == -1 {
  54. // guess
  55. return "UTF-8"
  56. }
  57. return strings.Trim(contentType[idx:], " ")
  58. }
  59. func checkError(err error) {
  60. if err != nil {
  61. fmt.Println("Fatal error ", err.Error())
  62. os.Exit(1)
  63. }
  64. }

Proxy handling

代理处理

Simple proxy

简单代理

HTTP 1.1 laid out how HTTP should work through a proxy. A "GET" request should be made to a proxy. However, the URL requested should be the full URL of the destination. In addition the HTTP header should contain a "Host" field, set to the proxy. As long as the proxy is configured to pass such requests through, then that is all that needs to be done.

HTTP 1.1规定了HTTP应当如何通过代理工作。向代理服务器发送一个"GET"请求。但是请求URL必须是完整的目标地址。此外,设置代理的HTTP头应当包括"Host"字段。只要代理服务器设置为允许这样的请求通过,那么做这些就够了。

Go considers this to be part of the HTTP transport layer. To manage this it has a class Transport. This contains a field which can be set to a function that returns a URL for a proxy. If we have a URL as a string for the proxy, the appropriate transport object is created and then given to a client object by

Go把这看成HTTP传输层的一部分。可使用Transport类进行管理。可以使用函数将代理服务器的URL返回到它的一个字段。假设有一个代理服务器地址字符串URL,相应的创建Transport对象并交给Client对象的代码就是:

  1. proxyURL, err := url.Parse(proxyString)
  2. transport := &http.Transport{Proxy: http.ProxyURL(proxyURL)}
  3. client := &http.Client{Transport: transport}

The client can then continue as before.

客户端可以像之前一样继续使用

The following program illustrates this:

下面是程序范例:

  1. /* ProxyGet
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "io"
  7. "net/http"
  8. "net/http/httputil"
  9. "net/url"
  10. "os"
  11. )
  12. func main() {
  13. if len(os.Args) != 3 {
  14. fmt.Println("Usage: ", os.Args[0], "http://proxy-host:port http://host:port/page")
  15. os.Exit(1)
  16. }
  17. proxyString := os.Args[1]
  18. proxyURL, err := url.Parse(proxyString)
  19. checkError(err)
  20. rawURL := os.Args[2]
  21. url, err := url.Parse(rawURL)
  22. checkError(err)
  23. transport := &http.Transport{Proxy: http.ProxyURL(proxyURL)}
  24. client := &http.Client{Transport: transport}
  25. request, err := http.NewRequest("GET", url.String(), nil)
  26. dump, _ := httputil.DumpRequest(request, false)
  27. fmt.Println(string(dump))
  28. response, err := client.Do(request)
  29. checkError(err)
  30. fmt.Println("Read ok")
  31. if response.Status != "200 OK" {
  32. fmt.Println(response.Status)
  33. os.Exit(2)
  34. }
  35. fmt.Println("Reponse ok")
  36. var buf [512]byte
  37. reader := response.Body
  38. for {
  39. n, err := reader.Read(buf[0:])
  40. if err != nil {
  41. os.Exit(0)
  42. }
  43. fmt.Print(string(buf[0:n]))
  44. }
  45. os.Exit(0)
  46. }
  47. func checkError(err error) {
  48. if err != nil {
  49. if err == io.EOF {
  50. return
  51. }
  52. fmt.Println("Fatal error ", err.Error())
  53. os.Exit(1)
  54. }
  55. }

If you have a proxy at, say, XYZ.com on port 8080, test this by

假设有一个代理服务器XYZ.com,端口8080,测试命令就是

  1. go run ProxyGet.go http://XYZ.com:8080/ http://www.google.com

If you don't have a suitable proxy to test this, then download and install the Squid proxy to your own computer.

如果没有合适的代理服务器可供测试,也可以在自己的计算机上下载安装Squid proxy。

The above program used a known proxy passed as an argument to the program. There are many ways in which proxies can be made known to applications. Most browsers have a configuration menu in which you can enter proxy information: such information is not available to a Go application. Some applications may get proxy information from an autoproxy.pac file somewhere in your network: Go does not (yet) know how to parse these JavaScript files and so cannot use them. Linux systems using Gnome have a configuration system called gconf in which proxy information can be stored: Go cannot access this. But it can find proxy information if it is set in operating system environment variables such as HTTP_PROXY or http_proxy using the function

上面的程序是将已知的代理服务器地址作为参数传入的。有很多办法可以将代理服务器的地址通知到应用程序。大多数浏览器可以通过配置菜单输入代理服务器的信息:但这些信息对Go应用没有用。有些应用程序可以从网络中某处找到autoproxy.pac文件取得其中的代理服务器信息,但Go(目前还)不能解析JavaScript文件,因此也不能使用。Gnome Linux系统使用的配置系统gconf里可以存储代理服务器信息,但Go也访问不了。但是,如果在操作系统环境变量中设置代理服务器信息(如HTTP_PROXY或http_proxy),Go可以通过以下函数访问到:

  1. func ProxyFromEnvironment(req *Request) (*url.URL, error)

If your programs are running in such an environment you can use this function instead of having to explicitly know the proxy parameters.

假如你的程序运行在这样的环境中,就可以使用此功能,而不用明确指定代理服务器参数。

Authenticating proxy

身份验证代理

Some proxies will require authentication, by a user name and password in order to pass requests. A common scheme is "basic authentication" in which the user name and password are concatenated into a string "user:password" and then BASE64 encoded. This is then given to the proxy by the HTTP request header "Proxy-Authorisation" with the flag that it is the basic authentication

有些代理服务器要求通过用户名和密码进行身份验证才能传递请求。一般的方法是“基本身份验证”:将用户名和密码串联成一个字符串“user:password”,然后进行Base64编码,然后添加到HTTP请求头的“Proxy-Authorization”中,再发送到代理服务器

The following program illlustrates this, adding the Proxy-Authentication header to the previous proxy program:

在前一个程序的基础上增加Proxy-Authorization头,示例如下:

  1. /* ProxyAuthGet
  2. */
  3. package main
  4. import (
  5. "encoding/base64"
  6. "fmt"
  7. "io"
  8. "net/http"
  9. "net/http/httputil"
  10. "net/url"
  11. "os"
  12. )
  13. const auth = "jannewmarch:mypassword"
  14. func main() {
  15. if len(os.Args) != 3 {
  16. fmt.Println("Usage: ", os.Args[0], "http://proxy-host:port http://host:port/page")
  17. os.Exit(1)
  18. }
  19. proxy := os.Args[1]
  20. proxyURL, err := url.Parse(proxy)
  21. checkError(err)
  22. rawURL := os.Args[2]
  23. url, err := url.Parse(rawURL)
  24. checkError(err)
  25. // encode the auth
  26. basic := "Basic " + base64.StdEncoding.EncodeToString([]byte(auth))
  27. transport := &http.Transport{Proxy: http.ProxyURL(proxyURL)}
  28. client := &http.Client{Transport: transport}
  29. request, err := http.NewRequest("GET", url.String(), nil)
  30. request.Header.Add("Proxy-Authorization", basic)
  31. dump, _ := httputil.DumpRequest(request, false)
  32. fmt.Println(string(dump))
  33. // send the request
  34. response, err := client.Do(request)
  35. checkError(err)
  36. fmt.Println("Read ok")
  37. if response.Status != "200 OK" {
  38. fmt.Println(response.Status)
  39. os.Exit(2)
  40. }
  41. fmt.Println("Reponse ok")
  42. var buf [512]byte
  43. reader := response.Body
  44. for {
  45. n, err := reader.Read(buf[0:])
  46. if err != nil {
  47. os.Exit(0)
  48. }
  49. fmt.Print(string(buf[0:n]))
  50. }
  51. os.Exit(0)
  52. }
  53. func checkError(err error) {
  54. if err != nil {
  55. if err == io.EOF {
  56. return
  57. }
  58. fmt.Println("Fatal error ", err.Error())
  59. os.Exit(1)
  60. }
  61. }

HTTPS connections by clients

客户端发起HTTPS连接

For secure, encrypted connections, HTTP uses TLS which is described in the chapter on security. The protocol of HTTP+TLS is called HTTPS and uses https:// urls instead of http:// urls.

为保证连接的安全和加密,HTTP使用其在安全性章节中说明的TLS技术。HTTP+TLS的协议被称为HTTPS,它使用https://地址,而不是http://地址。

Servers are required to return valid X.509 certificates before a client will accept data from them. If the certificate is valid, then Go handles everything under the hood and the clients given previously run okay with https URLs.

服务器必须在客户端接受从其数据前返回有效的X.509证书。如果证书有效,Go会在内部处理好所有的事情,而客户端会在使用HTTPS地址是和以前工作得一样出色。

Many sites have invalid certificates. They may have expired, they may be self-signed instead of by a recognised Certificate Authority or they may just have errors (such as having an incorrect server name). Browsers such as Firefox put a big warning notice with a "Get me out of here!" button, but you can carry on at your risk - which many people do.

许多网站都使用无效的证书。这些证书可能已经过期,或者是自行签名的,而没有让认可的证书颁发机构签名;又或者他们可能只是用错了(比如服务器名称不对)。浏览器(如Firefox),会显示一个很大的警告通知,通知上放着“立即离开!”按钮,但你也可以仍然继续此风险 - 很多人会这么做。

Go presently bails out when it encounters certificate errors. There is cautious support for carrying on but I haven't got it working yet. So there is no current example for "carrying on in the face of adversity :-)". Maybe later.

Go目前在遇到证书错误时,会bails out。对继续工作的支持非常谨慎,我还没有找到正确的方法。因此,目前也没有“继续此风险”任何示例 :-)。以后再说吧。

Servers

服务器

The other side to building a client is a Web server handling HTTP requests. The simplest - and earliest - servers just returned copies of files. However, any URL can now trigger an arbitrary computation in current servers.

这边创建客户端,另一边Web服务器则需要处理HTTP请求。最早最简单的服务器只是返回文件的副本。然而,目前的服务器上,随便一个URL都可能触发任何计算。

File server

文件服务器

We start with a basic file server. Go supplies a multi-plexer, that is, an object that will read and interpret requests. It hands out requests to handlers which run in their own thread. Thus much of the work of reading HTTP requests, decoding them and branching to suitable functions in their own thread is done for us.

我们从一个基本的文件服务器开始。Go提供了一个multi-plexer,即一个读取和解释请求的对象。它把请求交给运行在自己线程中的handlers。这样,许多读取HTTP请求,解码并转移到合适功能上的工作都可以在各自的线程中进行。

For a file server, Go also gives a FileServer object which knows how to deliver files from the local file system. It takes a "root" directory which is the top of a file tree in the local system, and a pattern to match URLs against. The simplest pattern is "/" which is the top of any URL. This will match all URLs.

对于文件服务器,Go也提供了一个FileServer对象,它知道如何发布本地文件系统中的文件。它需要一个“root”目录,该目录是在本地系统中文件树的顶端;还有一个针对URL的匹配模式。最简单的模式是“/”,这是所有URL的顶部,可以匹配所有的URL。

An HTTP server delivering files from the local file system is almost embarrassingly trivial given these objects. It is

HTTP服务器从本地文件系统中发布文件太简单了,让人都有点不好意思举例。如下:

  1. /* File Server
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "net/http"
  7. "os"
  8. )
  9. func main() {
  10. // deliver files from the directory /var/www
  11. // fileServer := http.FileServer(http.Dir("/var/www"))
  12. fileServer := http.FileServer(http.Dir("/home/httpd/html/"))
  13. // register the handler and deliver requests to it
  14. err := http.ListenAndServe(":8000", fileServer)
  15. checkError(err)
  16. // That's it!
  17. }
  18. func checkError(err error) {
  19. if err != nil {
  20. fmt.Println("Fatal error ", err.Error())
  21. os.Exit(1)
  22. }
  23. }

This server even delivers "404 not found" messages for requests for file resources that don't exist!

甚至当请求到一个不存在的文件资源时,这个服务器还提供了“404未找到”的信息!

Handler functions

处理函数(Handler function)

In this last program, the handler was given in the second argument to ListenAndServe. Any number of handlers can be registered first by calls to Handle or handleFunc, with signatures

上一个程序中,handler被作为第二个参数传给ListenAndServe。可以先注册任意多个handler供HandlehandleFunc使用。调用方式:

  1. func Handle(pattern string, handler Handler)
  2. func HandleFunc(pattern string, handler func(*Conn, *Request))

The second argument to HandleAndServe could be nil, and then calls are dispatched to all registered handlers. Each handler should have a different URL pattern. For example, the file handler might have URL pattern "/" while a function handler might have URL pattern "/cgi-bin". A more specific pattern takes precedence over a more general pattern.

HandleAndServe的第二个参数是可以是nil,调用会被分派到所有已注册的handler。每个对立对象都有不同的URL匹配模式。例如,可能文件handler的URL匹配模式是"/",而一个函数handler的URL匹配模式是"/cgi-bin"。这里具体的模式优先级高于一般的模式。

Common CGI programs are test-cgi (written in the shell) or printenv (written in Perl) which print the values of the environment variables. A handler can be written to work in a similar manner.

常见的CGI程序有test-cgi(shell程序)或printenv(Perl程序)用来打印环境变量的值。可以让handler用类似的方式工作。

  1. /* Print Env
  2. */
  3. package main
  4. import (
  5. "fmt"
  6. "net/http"
  7. "os"
  8. )
  9. func main() {
  10. // file handler for most files
  11. fileServer := http.FileServer(http.Dir("/var/www"))
  12. http.Handle("/", fileServer)
  13. // function handler for /cgi-bin/printenv
  14. http.HandleFunc("/cgi-bin/printenv", printEnv)
  15. // deliver requests to the handlers
  16. err := http.ListenAndServe(":8000", nil)
  17. checkError(err)
  18. // That's it!
  19. }
  20. func printEnv(writer http.ResponseWriter, req *http.Request) {
  21. env := os.Environ()
  22. writer.Write([]byte("<h1>Environment</h1>\n<pre>"))
  23. for _, v := range env {
  24. writer.Write([]byte(v + "\n"))
  25. }
  26. writer.Write([]byte("</pre>"))
  27. }
  28. func checkError(err error) {
  29. if err != nil {
  30. fmt.Println("Fatal error ", err.Error())
  31. os.Exit(1)
  32. }
  33. }

Note: for simplicity this program does not deliver well-formed HTML. It is missing html, head and body tags.

注:为简单起见,本程序不提供完整的的HTML。这里缺少html、head和body标签。

Using the cgi-bin directory in this program is a bit cheeky: it doesn't call an external program like CGI scripts do. It just calls a Go function. Go does have the ability to call external programs using os.ForkExec, but does not yet have support for dynamically linkable modules like Apache's mod_perl

这个程序在使用cgi-bin目录时有点耍赖。其实它并没有调用外部的CGI脚本程序,而只是使用了一个Go的内部函数。Go确实可以通过os.ForkExec调用外部的程序,但还不能支持像Apache的mod_perl这样的动态连接库

Bypassing the default multiplexer

绕过默认的multiplexer

HTTP requests received by a Go server are usually handled by a multiplexer the examines the path in the HTTP request and calls the appropriate file handler, etc. You can define your own handlers. These can either be registered with the default multiplexer by calling http.HandleFunc which takes a pattern and a function. The functions such as ListenAndServe then take a nil handler function. This was done in the last example.

Go服务器接收到的HTTP请求通常是由一个multiplexer进行处理,检查HTTP请求的路径,然后调用合适的文件handler等等。你也可以定义自己的handler。将一个匹配模式参数和一个函数作为参数,调用http.HandleFunc,可以将其注册为默认的multiplexer。像ListenAndServe这样的函数就可以使用nil作为handler function。上一个例子就是这样做的。

If you want to take over the multiplexer role then you can give a non-zero function as the handler function. This function will then be totally responsible for managing the requests and responses.

如果你想扮演multiplexer的角色,那么你就可以给一个非零函数作为handler function。这个函数将会全权负责管理请求和响应。

The following example is trivial, but illustrates the use of this: the multiplexer function simply returns a "204 No content" for all requests:

下面的例子非常简单,但它说明了如何使multiplexer对所有请求都只返回一个“204 No content”:

  1. /* ServerHandler
  2. */
  3. package main
  4. import (
  5. "net/http"
  6. )
  7. func main() {
  8. myHandler := http.HandlerFunc(func(rw http.ResponseWriter, request *http.Request) {
  9. // Just return no content - arbitrary headers can be set, arbitrary body
  10. rw.WriteHeader(http.StatusNoContent)
  11. })
  12. http.ListenAndServe(":8080", myHandler)
  13. }

Arbitrarily complex behaviour can be built, of course.

当然,也可以把它做成无所不能的。