介绍
Java常见性能优化结论是指几乎可以奉为铁律的常见的性能优化结论, 比如ArrayList尽量指定capacity初始化, 这样做大概能获得性能提升.
结论
JDK
- 字符串解码:
- new String(bytes, UTF_8) 的速度是 UTF_8.decode(ByteBuffer.wrap(bytes)) 的1.5倍
- 如果要对一个堆外内存做解码, 可能拷贝成byte[]再解码都还更快!
字符串解码性能
StringDecoder.ByteBuffer1 thrpt 5 4524175.075 ± 1148229.680 ops/sStringDecoder.ByteBuffer2 thrpt 5 4601780.821 ± 962116.783 ops/sStringDecoder.bytes thrpt 5 6790554.789 ± 443526.754 ops/sStringDecoder.direct1 thrpt 5 1671874.068 ± 92429.365 ops/sStringDecoder.direct2 thrpt 5 5537554.892 ± 601163.717 ops/sStringDecoder.direct3 thrpt 5 6476126.313 ± 802677.789 ops/sStringDecoder.direct4 thrpt 5 6495239.548 ± 428756.998 ops/s
字符串操作
- String.split(char) 在功能满足的情况下是最快的, 我试过一个基于
ThreadLocal<List<String>>+indexOf+subString都没有它快
Protobuf
- 解码速度 bytes=HeapByteBuffer > DirectByteBuffer > InputStream
- 各种protobuf版本之间性能差距不大, 如果你在编写SDK可以尽量用protobuf 3.5.1, 以获得最大兼容性
RPC
- GRPC 使用 localhost call自己, 在我的i7 9700K(8核, 4.6GHz)上大概有8万QPS, 此时CPU已经打满了
压缩
- LZ4从总体来说是比较好的, GZIP则是通用, 几乎每个语言都内置支持.
- 压缩时要考虑几点
- 对压缩率有没有要求: 比如走外网传输, 宁可多花点时间压缩得小一些
- 对CPU的要求: 这个就是看下面的ops了
- 还有一点是内存, 一般来说对内存要求就是一小块buffer, 都在可接受范围内
- 关于lz4有如下结论:
- lz4的block模式速度非常快, 但一些使用条件: 数据必须以完整块的形式(而非流)提供, 并且要额外存储一个原始数据的长度字段
- lz4_stream_frame(要手动指定64KB的BLOCKSIZE, 否则性能很差)在小流量(<64KB)下性能较差, 原因是内部实现需要分配一个64KB的buffer, 如果你的数据量级都还没64KB, 那肯定是得不偿失了
CompressionBenchmark.gzip thrpt 3 31605.737 ± 16282.314 ops/sCompressionBenchmark.lz4_block thrpt 3 430100.555 ± 159861.047 ops/sCompressionBenchmark.lz4_stream_block thrpt 3 103440.602 ± 62070.222 ops/sCompressionBenchmark.lz4_stream_frame_4MB thrpt 3 1883.749 ± 1104.060 ops/sCompressionBenchmark.lz4_stream_frame_64KB thrpt 3 90140.759 ± 2924.605 ops/sCompressionBenchmark.snappy_block thrpt 3 321211.591 ± 77484.317 ops/sCompressionBenchmark.snappy_stream thrpt 3 311741.572 ± 32818.686 ops/sCompressionBenchmark.snappy_stream_frame thrpt 3 190838.057 ± 6024.064 ops/sCompressionBenchmark.zstd_block thrpt 3 357862.879 ± 72551.014 ops/s
DecompressionBenchmark.gzip thrpt 3 115716.503 ± 72360.977 ops/sDecompressionBenchmark.lz4_block thrpt 3 1539561.292 ± 354261.379 ops/sDecompressionBenchmark.lz4_stream_block thrpt 3 358900.444 ± 115727.850 ops/sDecompressionBenchmark.lz4_stream_frame_4MB thrpt 3 2622.806 ± 3139.658 ops/sDecompressionBenchmark.lz4_stream_frame_64KB thrpt 3 130697.056 ± 83034.419 ops/sDecompressionBenchmark.snappy_block thrpt 3 1013814.061 ± 373383.806 ops/sDecompressionBenchmark.snappy_stream thrpt 3 446348.960 ± 223164.582 ops/sDecompressionBenchmark.snappy_stream_frame thrpt 3 256472.028 ± 75661.818 ops/sDecompressionBenchmark.zstd_block thrpt 3 239372.309 ± 24950.126 ops/s
lz4有block和frame两种模式, frame支持流方式, block只能以块的方式工作, 并且还需要存储原始bytes的长度才能正常解码.
lz4_stream 有一个参数 BLOCKSIZE, 为64KB和4MB的性能不一样, 默认4MB在性能测试用例里表现非常差
lz的block模式的速度实在是太快了.
lz4_stream_block 是java的SDK里独有的序列化方式, 不支持跨语言.
考虑stream场景的话, 似乎snappy_stream(stream的block模式)会更好一些
内容<27870(27KB), 时候snappy比lz4有优势.
83610 解压速度相当
1, size=2787=2.7KB
Benchmark (times) Mode Cnt Score Error UnitsCompressionBenchmark.gzip 1 thrpt 3 30476.637 ± 22469.601 ops/sCompressionBenchmark.gzip 10 thrpt 3 8793.686 ± 2574.396 ops/sCompressionBenchmark.gzip 100 thrpt 3 897.523 ± 105.704 ops/sCompressionBenchmark.gzip 1000 thrpt 3 87.183 ± 13.157 ops/sCompressionBenchmark.gzip 10000 thrpt 3 8.752 ± 1.635 ops/sCompressionBenchmark.gzip 100000 thrpt 3 0.867 ± 0.398 ops/sCompressionBenchmark.lz4_block 1 thrpt 3 426097.838 ± 75233.734 ops/sCompressionBenchmark.lz4_block 10 thrpt 3 199866.462 ± 76569.692 ops/sCompressionBenchmark.lz4_block 100 thrpt 3 31483.194 ± 6027.132 ops/sCompressionBenchmark.lz4_block 1000 thrpt 3 2851.337 ± 2213.362 ops/sCompressionBenchmark.lz4_block 10000 thrpt 3 269.400 ± 136.618 ops/sCompressionBenchmark.lz4_block 100000 thrpt 3 27.331 ± 9.160 ops/sCompressionBenchmark.lz4_stream_block 1 thrpt 3 94879.601 ± 80635.302 ops/sCompressionBenchmark.lz4_stream_block 10 thrpt 3 61914.665 ± 20936.356 ops/sCompressionBenchmark.lz4_stream_block 100 thrpt 3 11512.387 ± 5328.147 ops/sCompressionBenchmark.lz4_stream_block 1000 thrpt 3 1179.528 ± 329.489 ops/sCompressionBenchmark.lz4_stream_block 10000 thrpt 3 113.369 ± 70.591 ops/sCompressionBenchmark.lz4_stream_block 100000 thrpt 3 11.314 ± 3.399 ops/sCompressionBenchmark.lz4_stream_frame_4MB 1 thrpt 3 1747.848 ± 1474.668 ops/sCompressionBenchmark.lz4_stream_frame_4MB 10 thrpt 3 1800.767 ± 590.348 ops/sCompressionBenchmark.lz4_stream_frame_4MB 100 thrpt 3 1638.532 ± 503.025 ops/sCompressionBenchmark.lz4_stream_frame_4MB 1000 thrpt 3 898.183 ± 655.389 ops/sCompressionBenchmark.lz4_stream_frame_4MB 10000 thrpt 3 132.863 ± 147.222 ops/sCompressionBenchmark.lz4_stream_frame_4MB 100000 thrpt 3 14.465 ± 14.015 ops/sCompressionBenchmark.lz4_stream_frame_64KB 1 thrpt 3 86014.358 ± 61382.580 ops/s 2.7KBCompressionBenchmark.lz4_stream_frame_64KB 10 thrpt 3 71392.963 ± 47352.966 ops/s 27KBCompressionBenchmark.lz4_stream_frame_64KB 100 thrpt 3 16335.514 ± 3484.971 ops/s 270KBCompressionBenchmark.lz4_stream_frame_64KB 1000 thrpt 3 1712.004 ± 1311.118 ops/s 2.7MBCompressionBenchmark.lz4_stream_frame_64KB 10000 thrpt 3 158.377 ± 15.809 ops/s 27MBCompressionBenchmark.lz4_stream_frame_64KB 100000 thrpt 3 16.091 ± 8.657 ops/s 270MBCompressionBenchmark.snappy_block 1 thrpt 3 312192.847 ± 104727.578 ops/sCompressionBenchmark.snappy_block 10 thrpt 3 129820.683 ± 12515.426 ops/sCompressionBenchmark.snappy_block 100 thrpt 3 15520.652 ± 11128.116 ops/sCompressionBenchmark.snappy_block 1000 thrpt 3 1336.686 ± 458.776 ops/sCompressionBenchmark.snappy_block 10000 thrpt 3 124.586 ± 72.734 ops/sCompressionBenchmark.snappy_block 100000 thrpt 3 6.860 ± 71.986 ops/sCompressionBenchmark.snappy_stream 1 thrpt 3 303814.059 ± 59518.568 ops/sCompressionBenchmark.snappy_stream 10 thrpt 3 122983.113 ± 56427.231 ops/sCompressionBenchmark.snappy_stream 100 thrpt 3 9621.494 ± 4551.787 ops/sCompressionBenchmark.snappy_stream 1000 thrpt 3 878.781 ± 582.417 ops/sCompressionBenchmark.snappy_stream 10000 thrpt 3 87.643 ± 57.411 ops/sCompressionBenchmark.snappy_stream 100000 thrpt 3 8.496 ± 2.341 ops/sCompressionBenchmark.snappy_stream_frame 1 thrpt 3 186345.650 ± 66499.988 ops/sCompressionBenchmark.snappy_stream_frame 10 thrpt 3 42522.667 ± 9324.246 ops/sCompressionBenchmark.snappy_stream_frame 100 thrpt 3 4389.037 ± 1769.510 ops/sCompressionBenchmark.snappy_stream_frame 1000 thrpt 3 422.743 ± 148.596 ops/sCompressionBenchmark.snappy_stream_frame 10000 thrpt 3 42.032 ± 11.131 ops/sCompressionBenchmark.snappy_stream_frame 100000 thrpt 3 4.144 ± 2.044 ops/sCompressionBenchmark.zstd_block 1 thrpt 3 354103.877 ± 87927.077 ops/sCompressionBenchmark.zstd_block 10 thrpt 3 170996.340 ± 16523.852 ops/sCompressionBenchmark.zstd_block 100 thrpt 3 22181.822 ± 2737.957 ops/sCompressionBenchmark.zstd_block 1000 thrpt 3 2087.321 ± 722.517 ops/sCompressionBenchmark.zstd_block 10000 thrpt 3 187.554 ± 51.168 ops/sCompressionBenchmark.zstd_block 100000 thrpt 3 19.262 ± 14.627 ops/s
Benchmark (times) Mode Cnt Score Error UnitsDecompressionBenchmark.gzip 1 thrpt 3 112008.224 ± 88684.509 ops/sDecompressionBenchmark.gzip 10 thrpt 3 41766.300 ± 15613.426 ops/sDecompressionBenchmark.gzip 100 thrpt 3 5538.823 ± 936.172 ops/sDecompressionBenchmark.gzip 1000 thrpt 3 561.149 ± 67.428 ops/sDecompressionBenchmark.gzip 10000 thrpt 3 51.486 ± 11.546 ops/sDecompressionBenchmark.gzip 100000 thrpt 3 4.984 ± 4.984 ops/sDecompressionBenchmark.lz4_block 1 thrpt 3 1480106.952 ± 476628.190 ops/sDecompressionBenchmark.lz4_block 10 thrpt 3 394431.222 ± 179005.990 ops/sDecompressionBenchmark.lz4_block 100 thrpt 3 45654.908 ± 5388.289 ops/sDecompressionBenchmark.lz4_block 1000 thrpt 3 4557.351 ± 3349.894 ops/sDecompressionBenchmark.lz4_block 10000 thrpt 3 277.591 ± 90.608 ops/sDecompressionBenchmark.lz4_block 100000 thrpt 3 26.618 ± 33.890 ops/sDecompressionBenchmark.lz4_stream_block 1 thrpt 3 346575.397 ± 165038.139 ops/sDecompressionBenchmark.lz4_stream_block 10 thrpt 3 75249.815 ± 5688.006 ops/sDecompressionBenchmark.lz4_stream_block 100 thrpt 3 8456.427 ± 4475.604 ops/sDecompressionBenchmark.lz4_stream_block 1000 thrpt 3 837.352 ± 444.216 ops/sDecompressionBenchmark.lz4_stream_block 10000 thrpt 3 71.520 ± 67.908 ops/sDecompressionBenchmark.lz4_stream_block 100000 thrpt 3 5.425 ± 6.077 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 1 thrpt 3 2218.643 ± 2103.065 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 10 thrpt 3 2326.512 ± 2057.435 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 100 thrpt 3 2048.755 ± 1637.387 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 1000 thrpt 3 770.138 ± 278.671 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 10000 thrpt 3 77.706 ± 88.160 ops/sDecompressionBenchmark.lz4_stream_frame_4MB 100000 thrpt 3 6.428 ± 4.784 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 1 thrpt 3 117252.082 ± 121513.917 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 10 thrpt 3 68265.478 ± 49118.321 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 100 thrpt 3 11107.015 ± 574.870 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 1000 thrpt 3 1085.403 ± 286.883 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 10000 thrpt 3 92.337 ± 61.446 ops/sDecompressionBenchmark.lz4_stream_frame_64KB 100000 thrpt 3 6.600 ± 9.318 ops/sDecompressionBenchmark.snappy_block 1 thrpt 3 988539.622 ± 295176.078 ops/sDecompressionBenchmark.snappy_block 10 thrpt 3 258813.389 ± 152623.366 ops/sDecompressionBenchmark.snappy_block 100 thrpt 3 28527.520 ± 2777.272 ops/sDecompressionBenchmark.snappy_block 1000 thrpt 3 2756.711 ± 1055.446 ops/sDecompressionBenchmark.snappy_block 10000 thrpt 3 239.907 ± 235.602 ops/sDecompressionBenchmark.snappy_block 100000 thrpt 3 23.121 ± 7.881 ops/sDecompressionBenchmark.snappy_stream 1 thrpt 3 418944.611 ± 46233.898 ops/sDecompressionBenchmark.snappy_stream 10 thrpt 3 88229.179 ± 47283.949 ops/sDecompressionBenchmark.snappy_stream 100 thrpt 3 9918.215 ± 2091.578 ops/sDecompressionBenchmark.snappy_stream 1000 thrpt 3 947.466 ± 462.093 ops/sDecompressionBenchmark.snappy_stream 10000 thrpt 3 81.014 ± 37.452 ops/sDecompressionBenchmark.snappy_stream 100000 thrpt 3 6.812 ± 8.711 ops/sDecompressionBenchmark.snappy_stream_frame 1 thrpt 3 388836.283 ± 87093.766 ops/sDecompressionBenchmark.snappy_stream_frame 10 thrpt 3 89044.752 ± 35577.978 ops/sDecompressionBenchmark.snappy_stream_frame 100 thrpt 3 9186.633 ± 1799.486 ops/sDecompressionBenchmark.snappy_stream_frame 1000 thrpt 3 895.671 ± 436.136 ops/sDecompressionBenchmark.snappy_stream_frame 10000 thrpt 3 78.928 ± 66.317 ops/sDecompressionBenchmark.snappy_stream_frame 100000 thrpt 3 5.357 ± 10.173 ops/sDecompressionBenchmark.zstd_block 1 thrpt 3 241563.721 ± 74977.129 ops/sDecompressionBenchmark.zstd_block 10 thrpt 3 141162.058 ± 111682.627 ops/sDecompressionBenchmark.zstd_block 100 thrpt 3 37405.392 ± 5103.685 ops/sDecompressionBenchmark.zstd_block 1000 thrpt 3 4521.461 ± 2025.867 ops/sDecompressionBenchmark.zstd_block 10000 thrpt 3 289.500 ± 301.008 ops/sDecompressionBenchmark.zstd_block 100000 thrpt 3 27.835 ± 25.529 ops/s
1gzip_bytes=784lz4_stream_frame_64_bytes=1092lz4_stream_frame_bytes=1092lz4_stream_block_bytes=1119snappy_stream_bytes=1098snappy_stream_frame_bytes=1096lz4_block_bytes=1077snappy_block_bytes=1078zstd_block_bytes=83310gzip_bytes= 992lz4_stream_frame_64_bytes= 1194lz4_stream_frame_bytes= 1194lz4_stream_block_bytes= 1221snappy_stream_bytes= 2265snappy_stream_frame_bytes= 2263lz4_block_bytes= 1179snappy_block_bytes= 2245zstd_block_bytes= 843100gzip_bytes= 2518lz4_stream_frame_64_bytes= 6512lz4_stream_frame_bytes= 2177lz4_stream_block_bytes= 6607snappy_stream_bytes= 21474snappy_stream_frame_bytes= 17750lz4_block_bytes= 2162snappy_block_bytes= 17688zstd_block_bytes= 8691000gzip_bytes=17264lz4_stream_frame_64_bytes=57194lz4_stream_frame_bytes=12014lz4_stream_block_bytes=57935snappy_stream_bytes=210589snappy_stream_frame_bytes=170897lz4_block_bytes=11999snappy_block_bytes=170418zstd_block_bytes=109910000gzip_bytes= 164368lz4_stream_frame_64_bytes= 568176lz4_stream_frame_bytes= 116948lz4_stream_block_bytes= 575428snappy_stream_bytes= 2100914snappy_stream_frame_bytes= 1705790lz4_block_bytes= 110364snappy_block_bytes= 1701098zstd_block_bytes= 3391100000gzip_bytes= 1635193lz4_stream_frame_64_bytes=5675050lz4_stream_frame_bytes= 1166097lz4_stream_block_bytes= 5747361snappy_stream_bytes= 21006788snappy_stream_frame_bytes=17052403lz4_block_bytes= 1094011snappy_block_bytes= 17005615zstd_block_bytes= 26359
