背景
ios端在线上遇到了网络请求失败的问题,查到原因是客户的userId里面包含了符号”|”,在网路哦请求的时候需要这个参数了,ios没有对这个符号进行url encode,导致请求失败。
然后我测试和排查到安卓端没有这个问题,发现网络请求框架Retrofit+OkHttp自动对参数做了编码。
现象
传递进去的userId参数是:”哈哈|ABC”
然后编码出来的userId参数是:%E5%93%88%E5%93%88%7CABC
https://a.b.c/ef/v1/werqwe/csdfwef/xcvweg?userId=%E5%93%88%E5%93%88%7CABC&origin=android-SDK
原理
首先,这个url encode编码,也称为percent-encode,即百分号编码。关于编码原理,可以参考这篇:percent-encode 百分号编码。
这里能明显看得到,我们传递进去的参数,在框架内部自动做了url encode。
下文则开始分析网络请求框架中是在哪个地方做了这个编码的。
从Retrofit开始
Retrofit请求实例代码
@GET("users/{user}/repos")
suspend fun listReposKt(
@Path("user") user: String,
@Query("uid") uid: String
): List<GithubUserReposVO>
我们就看GET请求,这里分别用了注解:@GET
、@Path
、@Query
@GET
表示这是get请求。@Path
用来拼接请求url的路径。@Query
用来设置请求的query参数。
我们测试的情况是对参数做了url encode,那么我们看query注解:
@Documented
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface Query {
/** The query parameter name. */
String value();
/**
* Specifies whether the parameter {@linkplain #value() name} and value are already URL encoded.
*/
boolean encoded() default false;
}
我们看到encode变量默认是false,表明这个变量默认是没有提前url编码的,那么后续会由框架内部进行url encode处理。
如果设置成了true,表明开发者已经提前做了自定义的url encode了,框架内部将不对这个参数做url encode处理。
实际上发现只要能够作为标志请求参数的注解,都有一个encoded()方法,包括了query, field, queryMap, fieldMap等。
GET请求
Retrofit内部应该是根据不同的请求类型去处理不同的注解参数的标记的。即只有遇到了GET注解,才会去处理query注解。根据这个猜想,我们看GET注解。(后面发现这个猜想是错的)
//RequestFactory.java
private void parseMethodAnnotation(Annotation annotation) {
if (annotation instanceof DELETE) {
parseHttpMethodAndPath("DELETE", ((DELETE) annotation).value(), false);
} else if (annotation instanceof GET) {
parseHttpMethodAndPath("GET", ((GET) annotation).value(), false);
} else if (annotation instanceof HEAD) {
parseHttpMethodAndPath("HEAD", ((HEAD) annotation).value(), false);
} else if (annotation instanceof PATCH) {
parseHttpMethodAndPath("PATCH", ((PATCH) annotation).value(), true);
} else if (annotation instanceof POST) {
parseHttpMethodAndPath("POST", ((POST) annotation).value(), true);
} else if (annotation instanceof PUT) {
parseHttpMethodAndPath("PUT", ((PUT) annotation).value(), true);
} else if (annotation instanceof OPTIONS) {
parseHttpMethodAndPath("OPTIONS", ((OPTIONS) annotation).value(), false);
} else if (annotation instanceof HTTP) {
HTTP http = (HTTP) annotation;
parseHttpMethodAndPath(http.method(), http.path(), http.hasBody());
} else if (annotation instanceof retrofit2.http.Headers) {
String[] headersToParse = ((retrofit2.http.Headers) annotation).value();
if (headersToParse.length == 0) {
throw methodError(method, "@Headers annotation is empty.");
}
headers = parseHeaders(headersToParse);
} else if (annotation instanceof Multipart) {
if (isFormEncoded) {
throw methodError(method, "Only one encoding annotation is allowed.");
}
isMultipart = true;
} else if (annotation instanceof FormUrlEncoded) {
if (isMultipart) {
throw methodError(method, "Only one encoding annotation is allowed.");
}
isFormEncoded = true;
}
看第7行,这里把注解的值传递进去。一般GET注解的值是拼接一个相对路径的,Retrofit的用法是一开始构造的时候传递一个baseUrl,然后再请求的时候拼接各种相对路径。
继续看:
//RequestFactory.java
private void parseHttpMethodAndPath(String httpMethod, String value, boolean hasBody) {
if (this.httpMethod != null) {
throw methodError(method, "Only one HTTP method is allowed. Found: %s and %s.",
this.httpMethod, httpMethod);
}
this.httpMethod = httpMethod;
this.hasBody = hasBody;
if (value.isEmpty()) {
return;
}
// Get the relative URL path and existing query string, if present.
int question = value.indexOf('?');
if (question != -1 && question < value.length() - 1) {
// Ensure the query string does not have any named parameters.
String queryParams = value.substring(question + 1);
Matcher queryParamMatcher = PARAM_URL_REGEX.matcher(queryParams);
if (queryParamMatcher.find()) {
throw methodError(method, "URL query string \"%s\" must not have replace block. "
+ "For dynamic query parameters use @Query.", queryParams);
}
}
this.relativeUrl = value;
this.relativeUrlParamNames = parsePathParameters(value);
}
这里保存了相对路径到变量relativeUrl,然后也解析了相对路径中可能由符号’?’拼接的路径中的请求参数。把他们保存在relativeUrlParamNames容器中。
在这里没有找到query的影子,所以上述的猜想:Retrofit内部应该是根据不同的请求类型去处理不同的注解参数的标记的。即只有遇到了GET注解,才会去处理query注解。
是错误的。
一般带着问题找源码的时候很难去整个源码全局架构去分析,只能通过猜想和代码跳转,直接去看我们想要了解的那部分,这样无法站在全局、架构、设计的角度去吃透源码,但是比较方便快速定位问题的原理,比较节省时间,并且对具体问题的印象更深刻。
那么继续代码跳转query注解:
//RequestFactory.java
@Nullable
private ParameterHandler<?> parseParameterAnnotation(
int p, Type type, Annotation[] annotations, Annotation annotation) {
if (annotation instanceof Url) {
//...
}else if (annotation instanceof Path){
//...
}else if (annotation instanceof Query){
validateResolvableType(p, type);
Query query = (Query) annotation;
String name = query.value();
boolean encoded = query.encoded();
Class<?> rawParameterType = Utils.getRawType(type);
gotQuery = true;
if (Iterable.class.isAssignableFrom(rawParameterType)) {
if (!(type instanceof ParameterizedType)) {
throw parameterError(method, p, rawParameterType.getSimpleName()
+ " must include generic type (e.g., "
+ rawParameterType.getSimpleName()
+ "<String>)");
}
ParameterizedType parameterizedType = (ParameterizedType) type;
Type iterableType = Utils.getParameterUpperBound(0, parameterizedType);
Converter<?, String> converter =
retrofit.stringConverter(iterableType, annotations);
return new ParameterHandler.Query<>(name, converter, encoded).iterable();
} else if (rawParameterType.isArray()) {
Class<?> arrayComponentType = boxIfPrimitive(rawParameterType.getComponentType());
Converter<?, String> converter =
retrofit.stringConverter(arrayComponentType, annotations);
return new ParameterHandler.Query<>(name, converter, encoded).array();
} else {
Converter<?, String> converter =
retrofit.stringConverter(type, annotations);
return new ParameterHandler.Query<>(name, converter, encoded);
}
}else if (annotation instanceof QueryName){
//...
}else if(...){
//...
}
//...
}
在parseParameterAnnotation
函数中找到了处理query的逻辑,这是一个比较长的函数,达到了400多行。我们只看query的处理。
在14行提取了encoded变量,然后作为参数构造了对象:ParameterHandler.Query。
// ParameterHandler.java
static final class Query<T> extends ParameterHandler<T> {
private final String name;
private final Converter<T, String> valueConverter;
private final boolean encoded;
Query(String name, Converter<T, String> valueConverter, boolean encoded) {
this.name = checkNotNull(name, "name == null");
this.valueConverter = valueConverter;
this.encoded = encoded;
}
@Override void apply(RequestBuilder builder, @Nullable T value) throws IOException {
if (value == null) return; // Skip null values.
String queryValue = valueConverter.convert(value);
if (queryValue == null) return; // Skip converted but null values
builder.addQueryParam(name, queryValue, encoded);
}
}
encoded变量在19行,apply函数中调用,传递到builder.addQueryParam。builder是RequestBuilder。其实就是用来构建OkHttp的Request对象的。
看他的addQueryParam函数:
// RequestBuilder.java
void addQueryParam(String name, @Nullable String value, boolean encoded) {
if (relativeUrl != null) {
// Do a one-time combination of the built relative URL and the base URL.
urlBuilder = baseUrl.newBuilder(relativeUrl);
if (urlBuilder == null) {
throw new IllegalArgumentException(
"Malformed URL. Base: " + baseUrl + ", Relative: " + relativeUrl);
}
relativeUrl = null;
}
if (encoded) {
//noinspection ConstantConditions Checked to be non-null by above 'if' block.
urlBuilder.addEncodedQueryParameter(name, value);
} else {
//noinspection ConstantConditions Checked to be non-null by above 'if' block.
urlBuilder.addQueryParameter(name, value);
}
}
根据encoded变量,分别执行了urlBuilder的addEncodedQueryParameter和addQueryParameter方法。
urlBuilder是HttpUrl.Builder类对象,也就是用来构造url的类。
HttpUrl类型则来自OkHttp,我们需要看OkHttp的内容了。
到OkHttp了
分别看上述的两个方法定义:
//HttpUrl.Builder
/** Encodes the query parameter using UTF-8 and adds it to this URL's query string. */
fun addQueryParameter(name: String, value: String?) = apply {
if (encodedQueryNamesAndValues == null) encodedQueryNamesAndValues = mutableListOf()
encodedQueryNamesAndValues!!.add(name.canonicalize(
encodeSet = QUERY_COMPONENT_ENCODE_SET,
plusIsSpace = true
))
encodedQueryNamesAndValues!!.add(value?.canonicalize(
encodeSet = QUERY_COMPONENT_ENCODcanonicalE_SET,
plusIsSpace = true
))
}
/** Adds the pre-encoded query parameter to this URL's query string. */
fun addEncodedQueryParameter(encodedName: String, encodedValue: String?) = apply {
if (encodedQueryNamesAndValues == null) encodedQueryNamesAndValues = mutableListOf()
encodedQueryNamesAndValues!!.add(encodedName.canonicalize(
encodeSet = QUERY_COMPONENT_REENCODE_SET,
alreadyEncoded = true,
plusIsSpace = true
))
encodedQueryNamesAndValues!!.add(encodedValue?.canonicalize(
encodeSet = QUERY_COMPONENT_REENCODE_SET,
alreadyEncoded = true,
plusIsSpace = true
))
}
他的逻辑其实就是,向encodedQueryNamesAndValues容器中先添加canonicalize函数处理过的name,再添加canonicalize函数处理过的value。
canonical有规范化的意思,这里把参数的name和value规范化了,难道就是url encode了?继续看下
/**
* Returns a substring of `input` on the range `[pos..limit)` with the following
* transformations:
*
* * Tabs, newlines, form feeds and carriage returns are skipped.
*
* * In queries, ' ' is encoded to '+' and '+' is encoded to "%2B".
*
* * Characters in `encodeSet` are percent-encoded.
*
* * Control characters and non-ASCII characters are percent-encoded.
*
* * All other characters are copied without transformation.
*
* @param alreadyEncoded true to leave '%' as-is; false to convert it to '%25'.
* @param strict true to encode '%' if it is not the prefix of a valid percent encoding.
* @param plusIsSpace true to encode '+' as "%2B" if it is not already encoded.
* @param unicodeAllowed true to leave non-ASCII codepoint unencoded.
* @param charset which charset to use, null equals UTF-8.
*/
internal fun String.canonicalize(
pos: Int = 0,
limit: Int = length,
encodeSet: String,
alreadyEncoded: Boolean = false,
strict: Boolean = false,
plusIsSpace: Boolean = false,
unicodeAllowed: Boolean = false,
charset: Charset? = null
): String {
var codePoint: Int
var i = pos
while (i < limit) {
codePoint = codePointAt(i)
if (codePoint < 0x20 ||
codePoint == 0x7f ||
codePoint >= 0x80 && !unicodeAllowed ||
codePoint.toChar() in encodeSet ||
codePoint == '%'.toInt() &&
(!alreadyEncoded || strict && !isPercentEncoded(i, limit)) ||
codePoint == '+'.toInt() && plusIsSpace) {
// Slow path: the character at i requires encoding!
val out = Buffer()
out.writeUtf8(this, pos, i)
out.writeCanonicalized(
input = this,
pos = i,
limit = limit,
encodeSet = encodeSet,
alreadyEncoded = alreadyEncoded,
strict = strict,
plusIsSpace = plusIsSpace,
unicodeAllowed = unicodeAllowed,
charset = charset
)
return out.readUtf8()
}
i += Character.charCount(codePoint)
}
// Fast path: no characters in [pos..limit) required encoding.
return substring(pos, limit)
}
算是猜对了,这个函数做的事情就是url encode。
函数的具体算法就不看了,可以看到函数的参数有个alreadyEncoded: Boolean,即可以配置是不是已经编码过了。
前面看到所有的url encode后的参数都存在容器encodedQueryNamesAndValues里面,他是怎么被使用的呢?
HttpUrl.Builder是用来buildHttpUrl的,他会把query参数全部拼接好然后给到HttpUrl。
// HttpUrl.Builder
fun build(): HttpUrl {
@Suppress("UNCHECKED_CAST") // percentDecode returns either List<String?> or List<String>.
return HttpUrl(
scheme = scheme ?: throw IllegalStateException("scheme == null"),
username = encodedUsername.percentDecode(),
password = encodedPassword.percentDecode(),
host = host ?: throw IllegalStateException("host == null"),
port = effectivePort(),
pathSegments = encodedPathSegments.percentDecode() as List<String>,
queryNamesAndValues = encodedQueryNamesAndValues?.percentDecode(plusIsSpace = true),
fragment = encodedFragment?.percentDecode(),
url = toString()
)
}
看第13行的toString
override fun toString(): String {
return buildString {
if (scheme != null) {
append(scheme)
append("://")
} else {
append("//")
}
if (encodedUsername.isNotEmpty() || encodedPassword.isNotEmpty()) {
append(encodedUsername)
if (encodedPassword.isNotEmpty()) {
append(':')
append(encodedPassword)
}
append('@')
}
if (host != null) {
if (':' in host!!) {
// Host is an IPv6 address.
append('[')
append(host)
append(']')
} else {
append(host)
}
}
if (port != -1 || scheme != null) {
val effectivePort = effectivePort()
if (scheme == null || effectivePort != defaultPort(scheme!!)) {
append(':')
append(effectivePort)
}
}
encodedPathSegments.toPathString(this)
if (encodedQueryNamesAndValues != null) {
append('?')
encodedQueryNamesAndValues!!.toQueryString(this)
}
if (encodedFragment != null) {
append('#')
append(encodedFragment)
}
}
}
看35到38行,拼接”?”,然后拼接参数:
// HttpUrl.companion object
/** Returns a string for this list of query names and values. */
internal fun List<String?>.toQueryString(out: StringBuilder) {
for (i in 0 until size step 2) {
val name = this[i]
val value = this[i + 1]
if (i > 0) out.append('&')
out.append(name)
if (value != null) {
out.append('=')
out.append(value)
}
}
}
步长为2,将name和value依次拼接。
那这个HttpUrl构造给谁用呢?
内部和外部都在直接用。
内部给底层连接池用,上层拦截器用,外部给Retrofit等第三方库用,或者也可以直接面向客户使用。
总结
okhttp太牛掰了。