自定义测试：Tracks

自定义测试：Tracks

定义

一条Track定义了一个或多个测试场景，具体的结构定义在另一个文件中

从存在集群创建

如果你已经有一个存有数据的es集群，你可以使用rally的子命令create-track创建一个Rally的track。如基于一个已经部署的es集群通过products 和 companies这两个索引创建一个tracks，命令如下：

esrally create-track --track=acme --target-hosts=127.0.0.1:9200 --indices="products,companies" --output-path=~/tracks

如果要连接到启用了TLS和基本身份验证的集群，需要指明--client-options代码如下：

esrally create-track --track=acme --target-hosts=abcdef123.us-central-1.gcp.cloud.es.io:9243 --client-options="timeout:60,use_ssl:true,verify_certs:true,basic_auth_user:'elastic',basic_auth_password:'secret-password'" --indices="products,companies" --output-path=~/tracks

需要将 basic_auth_user 和 basic_auth_password 进行相应的修改。track生成器将在指定的输出目录中创建一个具有track名称的文件夹：

> find tracks/acme
tracks/acme
tracks/acme/companies-documents.json
tracks/acme/companies-documents.json.bz2
tracks/acme/companies-documents-1k.json
tracks/acme/companies-documents-1k.json.bz2
tracks/acme/companies.json
tracks/acme/products-documents.json
tracks/acme/products-documents.json.bz2
tracks/acme/products-documents-1k.json
tracks/acme/products-documents-1k.json.bz2
tracks/acme/products.json
tracks/acme/track.json

文件组织如下：

track.json 包含了实际的测试track
companies.json和products.json包含提取的索引的映射和设置。
*-documents.json(.bz2)包含提取索引中所有文档数据，带有-1k后缀的文件包含文档语料库的较小版本，以支持test模式。

从数据集构建

我们手把手的教你构建一个tracktutorial我们将所有内容存储在〜/rally-tracks/tutorial目录中，你也可以选择其他任何位置。

首先获取数据，Geonames根据creative commons license许可提供地理数据。下载allCountries.zip（大约300MB），解压缩并检查allCountries.txt。该文件以制表符分隔，但要使用Elasticsearch批量索引数据，我们需要JSON格式的数据。使用以下脚本转换数据：

import json
cols = (("geonameid", "int", True),
        ("name", "string", True),
        ("asciiname", "string", False),
        ("alternatenames", "string", False),
        ("latitude", "double", True),
        ("longitude", "double", True),
        ("feature_class", "string", False),
        ("feature_code", "string", False),
        ("country_code", "string", True),
        ("cc2", "string", False),
        ("admin1_code", "string", False),
        ("admin2_code", "string", False),
        ("admin3_code", "string", False),
        ("admin4_code", "string", False),
        ("population", "long", True),
        ("elevation", "int", False),
        ("dem", "string", False),
        ("timezone", "string", False))
def main():
    with open("allCountries.txt", "rt", encoding="UTF-8") as f:
        for line in f:
            tup = line.strip().split("\t")
            record = {}
            for i in range(len(cols)):
                name, type, include = cols[i]
                if tup[i] != "" and include:
                    if type in ("int", "long"):
                        record[name] = int(tup[i])
                    elif type == "double":
                        record[name] = float(tup[i])
                    elif type == "string":
                        record[name] = tup[i]
            print(json.dumps(record, ensure_ascii=False))
if __name__ == "__main__":
    main()

把脚本存储为toJSON.py,放到tutorial文件夹底下（~/rally-tracks/tutorial）使用python命令执行它：python3 toJSON.py > documents.json

然后将以下映射文件作为index.json存储在tutorial目录中：

  "settings": {
    "index.number_of_replicas": 0
  },
  "mappings": {
    "docs": {
      "dynamic": "strict",
      "properties": {
        "geonameid": {
          "type": "long"
        },
        "name": {
          "type": "text"
        },
        "latitude": {
          "type": "double"
        },
        "longitude": {
          "type": "double"
        },
        "country_code": {
          "type": "text"
        },
        "population": {
          "type": "long"
        }
      }
    }
  }
}

注意本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试，则需要删除上面的映射类型。

有关es的语法的详细信息，请参阅有关映射的Elasticsearch文档和create index API
最后，将tarck存储为tutorial目录中的track.json:

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "geonames",
      "body": "index.json",
      "types": [ "docs" ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 11658903,
          "uncompressed-bytes": 1544799789
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "delete-index"
      }
    },
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "operation-type": "force-merge"
      }
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "clients": 8,
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

其中文档数可以通过wc -l documents.json获取，文档未压缩大小可以通过ll获取。

注意本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试，则需要删除上面的types属性。

注意您可以将任何支持的脚本与track一起存储。但是，您需要将它们放置在以“ ”开头的目录中，例如“_support”。 Rally从任何目录加载跟踪插件（请参阅下文），但将忽略以“”开头的目录。

注意我们为轨道定义了JSON模式，您可以使用它检查如何定义track。您还应该检查Rally提供的track以获取灵感。

当你运行esrally list tracks --track-path=~/rally-tracks/tutorial的时候，一条新的track就会出现：

dm@io:~ $ esrally list tracks --track-path=~/rally-tracks/tutorial
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
Available tracks:
Name        Description                   Documents    Compressed Size  Uncompressed Size
----------  ----------------------------- -----------  ---------------  -----------------
tutorial    Tutorial benchmark for Rally      11658903  N/A              1.4 GB

您还可以通过以下方式显示有关track的详细信息： esrally info --track-path=~/rally-tracks/tutorial

dm@io:~ $ esrally info --track-path=~/rally-tracks/tutorial
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
Showing details for track [tutorial]:
* Description: Tutorial benchmark for Rally
* Documents: 11,658,903
* Compressed Size: N/A
* Uncompressed Size: 1.4 GB
Schedule:
----------
1. delete-index
2. create-index
3. cluster-health
4. bulk (8 clients)
5. force-merge
6. query-match-all (8 clients)

恭喜，您已经创建了第一条track！你可以用esrally --distribution-version=6.0.0 --track-path=~/rally-tracks/tutorial测试es集群了。

增加测试数据

这里提供的数据数量很有限，下面的脚本可以快速增加数据量：

import json,random
from tqdm import tqdm
MAX_NUM=10000000*3
def create_data():
  num = MAX_NUM
  for i in tqdm(range(num)):
    geonameid = random.randint(1,100)
    latitude = random.uniform(10,20)
    name = random.sample('zyxwvutsrqponmlkjihgfedcba',5)
    name = ''.join(name)
    longitude = random.uniform(20,30)
    population = random.randint(1,10000)
    data = {"geonameid":geonameid,"latitude":latitude,"name":name,"longitude":longitude,"population":population}
    print(json.dumps(data, ensure_ascii=False))
if __name__ == "__main__":
    create_data()

将其保存为createJson.py执行 python3 createJson.py >> documents.json 通过配置MAX_NUM的值，每10000000的大小为1.2G的数据。