自定义测试:Tracks
定义
一条Track定义了一个或多个测试场景,具体的结构定义在另一个文件中
从存在集群创建
如果你已经有一个存有数据的es集群,你可以使用rally的子命令create-track创建一个Rally的track。如基于一个已经部署的es集群通过products 和 companies这两个索引创建一个tracks,命令如下:
esrally create-track --track=acme --target-hosts=127.0.0.1:9200 --indices="products,companies" --output-path=~/tracks
如果要连接到启用了TLS和基本身份验证的集群,需要指明--client-options代码如下:
esrally create-track --track=acme --target-hosts=abcdef123.us-central-1.gcp.cloud.es.io:9243 --client-options="timeout:60,use_ssl:true,verify_certs:true,basic_auth_user:'elastic',basic_auth_password:'secret-password'" --indices="products,companies" --output-path=~/tracks
需要将 basic_auth_user 和 basic_auth_password 进行相应的修改。track生成器将在指定的输出目录中创建一个具有track名称的文件夹:
> find tracks/acmetracks/acmetracks/acme/companies-documents.jsontracks/acme/companies-documents.json.bz2tracks/acme/companies-documents-1k.jsontracks/acme/companies-documents-1k.json.bz2tracks/acme/companies.jsontracks/acme/products-documents.jsontracks/acme/products-documents.json.bz2tracks/acme/products-documents-1k.jsontracks/acme/products-documents-1k.json.bz2tracks/acme/products.jsontracks/acme/track.json
文件组织如下:
track.json包含了实际的测试trackcompanies.json和products.json包含提取的索引的映射和设置。*-documents.json(.bz2)包含提取索引中所有文档数据,带有-1k后缀的文件包含文档语料库的较小版本,以支持test模式。
从数据集构建
我们手把手的教你构建一个tracktutorial我们将所有内容存储在〜/rally-tracks/tutorial目录中,你也可以选择其他任何位置。
首先获取数据,Geonames根据creative commons license许可提供地理数据。下载allCountries.zip(大约300MB),解压缩并检查allCountries.txt。
该文件以制表符分隔,但要使用Elasticsearch批量索引数据,我们需要JSON格式的数据。使用以下脚本转换数据:
import jsoncols = (("geonameid", "int", True),("name", "string", True),("asciiname", "string", False),("alternatenames", "string", False),("latitude", "double", True),("longitude", "double", True),("feature_class", "string", False),("feature_code", "string", False),("country_code", "string", True),("cc2", "string", False),("admin1_code", "string", False),("admin2_code", "string", False),("admin3_code", "string", False),("admin4_code", "string", False),("population", "long", True),("elevation", "int", False),("dem", "string", False),("timezone", "string", False))def main():with open("allCountries.txt", "rt", encoding="UTF-8") as f:for line in f:tup = line.strip().split("\t")record = {}for i in range(len(cols)):name, type, include = cols[i]if tup[i] != "" and include:if type in ("int", "long"):record[name] = int(tup[i])elif type == "double":record[name] = float(tup[i])elif type == "string":record[name] = tup[i]print(json.dumps(record, ensure_ascii=False))if __name__ == "__main__":main()
把脚本存储为toJSON.py,放到tutorial文件夹底下(~/rally-tracks/tutorial)使用python命令执行它:python3 toJSON.py > documents.json
然后将以下映射文件作为index.json存储在tutorial目录中:
"settings": {"index.number_of_replicas": 0},"mappings": {"docs": {"dynamic": "strict","properties": {"geonameid": {"type": "long"},"name": {"type": "text"},"latitude": {"type": "double"},"longitude": {"type": "double"},"country_code": {"type": "text"},"population": {"type": "long"}}}}}
注意 本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试,则需要删除上面的映射类型。
有关es的语法的详细信息,请参阅有关映射的Elasticsearch文档和create index API
最后,将tarck存储为tutorial目录中的track.json:
{"version": 2,"description": "Tutorial benchmark for Rally","indices": [{"name": "geonames","body": "index.json","types": [ "docs" ]}],"corpora": [{"name": "rally-tutorial","documents": [{"source-file": "documents.json","document-count": 11658903,"uncompressed-bytes": 1544799789}]}],"schedule": [{"operation": {"operation-type": "delete-index"}},{"operation": {"operation-type": "create-index"}},{"operation": {"operation-type": "cluster-health","request-params": {"wait_for_status": "green"}}},{"operation": {"operation-type": "bulk","bulk-size": 5000},"warmup-time-period": 120,"clients": 8},{"operation": {"operation-type": "force-merge"}},{"operation": {"name": "query-match-all","operation-type": "search","body": {"query": {"match_all": {}}}},"clients": 8,"warmup-iterations": 1000,"iterations": 1000,"target-throughput": 100}]}
其中文档数可以通过wc -l documents.json获取,文档未压缩大小可以通过ll获取。
注意 本教程假定您要对7.0.0之前的Elasticsearch版本进行基准测试。如果要对Elasticsearch 7.0.0或更高版本进行基准测试,则需要删除上面的types属性。
注意 您可以将任何支持的脚本与track一起存储。但是,您需要将它们放置在以“ ”开头的目录中,例如“_support”。 Rally从任何目录加载跟踪插件(请参阅下文),但将忽略以“”开头的目录。
注意 我们为轨道定义了JSON模式,您可以使用它检查如何定义track。您还应该检查Rally提供的track以获取灵感。
当你运行esrally list tracks --track-path=~/rally-tracks/tutorial的时候,一条新的track就会出现:
dm@io:~ $ esrally list tracks --track-path=~/rally-tracks/tutorial____ ____/ __ \____ _/ / /_ __/ /_/ / __ `/ / / / / // _, _/ /_/ / / / /_/ //_/ |_|\__,_/_/_/\__, //____/Available tracks:Name Description Documents Compressed Size Uncompressed Size---------- ----------------------------- ----------- --------------- -----------------tutorial Tutorial benchmark for Rally 11658903 N/A 1.4 GB
您还可以通过以下方式显示有关track的详细信息:
esrally info --track-path=~/rally-tracks/tutorial
dm@io:~ $ esrally info --track-path=~/rally-tracks/tutorial____ ____/ __ \____ _/ / /_ __/ /_/ / __ `/ / / / / // _, _/ /_/ / / / /_/ //_/ |_|\__,_/_/_/\__, //____/Showing details for track [tutorial]:* Description: Tutorial benchmark for Rally* Documents: 11,658,903* Compressed Size: N/A* Uncompressed Size: 1.4 GBSchedule:----------1. delete-index2. create-index3. cluster-health4. bulk (8 clients)5. force-merge6. query-match-all (8 clients)
恭喜,您已经创建了第一条track!你可以用esrally --distribution-version=6.0.0 --track-path=~/rally-tracks/tutorial测试es集群了。
增加测试数据
这里提供的数据数量很有限,下面的脚本可以快速增加数据量:
import json,randomfrom tqdm import tqdmMAX_NUM=10000000*3def create_data():num = MAX_NUMfor i in tqdm(range(num)):geonameid = random.randint(1,100)latitude = random.uniform(10,20)name = random.sample('zyxwvutsrqponmlkjihgfedcba',5)name = ''.join(name)longitude = random.uniform(20,30)population = random.randint(1,10000)data = {"geonameid":geonameid,"latitude":latitude,"name":name,"longitude":longitude,"population":population}print(json.dumps(data, ensure_ascii=False))if __name__ == "__main__":create_data()
将其保存为createJson.py执行 python3 createJson.py >> documents.json 通过配置MAX_NUM的值,每10000000的大小为1.2G的数据。
