使用 Ray Tune 的超参数调整

原文:https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html

超参数调整可以使平均模型与高精度模型有所不同。 通常,选择不同的学习率或更改网络层大小等简单的事情可能会对模型表现产生巨大影响。

幸运的是,有一些工具可以帮助您找到最佳的参数组合。 Ray Tune 是用于分布式超参数调整的行业标准工具。 Ray Tune 包含最新的超参数搜索算法,与 TensorBoard 和其他分析库集成,并通过 Ray 的分布式机器学习引擎本地支持分布式训练。

在本教程中,我们将向您展示如何将 Ray Tune 集成到 PyTorch 训练工作流程中。 我们将扩展 PyTorch 文档的本教程,以训练 CIFAR10 图像分类器。

如您所见,我们只需要添加一些细微的修改即可。 特别是,我们需要

  1. 在函数中包装数据加载和训练,
  2. 使一些网络参数可配置,
  3. 添加检查点(可选),
  4. 并定义用于模型调整的搜索空间

要运行本教程,请确保已安装以下包:

  • ray[tune]:分布式超参数调整库
  • torchvision:用于数据转换器

设置/导入

让我们从导入开始:

  1. from functools import partial
  2. import numpy as np
  3. import os
  4. import torch
  5. import torch.nn as nn
  6. import torch.nn.functional as F
  7. import torch.optim as optim
  8. from torch.utils.data import random_split
  9. import torchvision
  10. import torchvision.transforms as transforms
  11. from ray import tune
  12. from ray.tune import CLIReporter
  13. from ray.tune.schedulers import ASHAScheduler

建立 PyTorch 模型需要大多数导入产品。 Ray Tune 仅最后三个导入。

数据加载器

我们将数据加载器包装在它们自己的函数中,并传递一个全局数据目录。 这样,我们可以在不同的试验之间共享数据目录。

  1. def load_data(data_dir="./data"):
  2. transform = transforms.Compose([
  3. transforms.ToTensor(),
  4. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
  5. ])
  6. trainset = torchvision.datasets.CIFAR10(
  7. root=data_dir, train=True, download=True, transform=transform)
  8. testset = torchvision.datasets.CIFAR10(
  9. root=data_dir, train=False, download=True, transform=transform)
  10. return trainset, testset

可配置的神经网络

我们只能调整那些可配置的参数。 在此示例中,我们可以指定全连接层的层大小:

  1. class Net(nn.Module):
  2. def __init__(self, l1=120, l2=84):
  3. super(Net, self).__init__()
  4. self.conv1 = nn.Conv2d(3, 6, 5)
  5. self.pool = nn.MaxPool2d(2, 2)
  6. self.conv2 = nn.Conv2d(6, 16, 5)
  7. self.fc1 = nn.Linear(16 * 5 * 5, l1)
  8. self.fc2 = nn.Linear(l1, l2)
  9. self.fc3 = nn.Linear(l2, 10)
  10. def forward(self, x):
  11. x = self.pool(F.relu(self.conv1(x)))
  12. x = self.pool(F.relu(self.conv2(x)))
  13. x = x.view(-1, 16 * 5 * 5)
  14. x = F.relu(self.fc1(x))
  15. x = F.relu(self.fc2(x))
  16. x = self.fc3(x)
  17. return x

训练函数

现在变得有趣了,因为我们对 PyTorch 文档中的示例进行了一些更改。

我们将训练脚本包装在函数train_cifar(config, checkpoint_dir=None, data_dir=None)中。 可以猜到,config参数将接收我们要训练的超参数。 checkpoint_dir参数用于还原检查点。 data_dir指定了我们加载和存储数据的目录,因此多次运行可以共享同一数据源。

  1. net = Net(config["l1"], config["l2"])
  2. if checkpoint_dir:
  3. model_state, optimizer_state = torch.load(
  4. os.path.join(checkpoint_dir, "checkpoint"))
  5. net.load_state_dict(model_state)
  6. optimizer.load_state_dict(optimizer_state)

优化器的学习率也可以配置:

  1. optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

我们还将训练数据分为训练和验证子集。 因此,我们训练了 80% 的数据,并计算了其余 20% 的验证损失。 我们遍历训练和测试集的批量大小也是可配置的。

通过DataParallel添加(多)GPU 支持

图像分类主要受益于 GPU。 幸运的是,我们可以继续在 Ray Tune 中使用 PyTorch 的抽象。 因此,我们可以将模型包装在nn.DataParallel中,以支持在多个 GPU 上进行数据并行训练:

  1. device = "cpu"
  2. if torch.cuda.is_available():
  3. device = "cuda:0"
  4. if torch.cuda.device_count() > 1:
  5. net = nn.DataParallel(net)
  6. net.to(device)

通过使用device变量,我们可以确保在没有 GPU 的情况下训练也能正常进行。 PyTorch 要求我们将数据显式发送到 GPU 内存,如下所示:

  1. for i, data in enumerate(trainloader, 0):
  2. inputs, labels = data
  3. inputs, labels = inputs.to(device), labels.to(device)

该代码现在支持在 CPU,单个 GPU 和多个 GPU 上进行训练。 值得注意的是,Ray 还支持分数 GPU ,因此我们可以在试验之间共享 GPU,只要模型仍适合 GPU 内存即可。 我们稍后再讲。

与 Ray Tune 交流

最有趣的部分是与 Ray Tune 的交流:

  1. with tune.checkpoint_dir(epoch) as checkpoint_dir:
  2. path = os.path.join(checkpoint_dir, "checkpoint")
  3. torch.save((net.state_dict(), optimizer.state_dict()), path)
  4. tune.report(loss=(val_loss / val_steps), accuracy=correct / total)

在这里,我们首先保存一个检查点,然后将一些指标报告给 Ray Tune。 具体来说,我们将验证损失和准确率发送回 Ray Tune。 然后,Ray Tune 可以使用这些指标来决定哪种超参数配置可以带来最佳结果。 这些指标还可用于尽早停止效果不佳的试验,以避免浪费资源进行试验。

保存检查点是可选的,但是,如果我们想使用高级调度器,例如基于总体的训练,则有必要。 另外,通过保存检查点,我们可以稍后加载经过训练的模型并在测试集上对其进行验证。

完整的训练函数

完整的代码示例如下所示:

  1. def train_cifar(config, checkpoint_dir=None, data_dir=None):
  2. net = Net(config["l1"], config["l2"])
  3. device = "cpu"
  4. if torch.cuda.is_available():
  5. device = "cuda:0"
  6. if torch.cuda.device_count() > 1:
  7. net = nn.DataParallel(net)
  8. net.to(device)
  9. criterion = nn.CrossEntropyLoss()
  10. optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
  11. if checkpoint_dir:
  12. model_state, optimizer_state = torch.load(
  13. os.path.join(checkpoint_dir, "checkpoint"))
  14. net.load_state_dict(model_state)
  15. optimizer.load_state_dict(optimizer_state)
  16. trainset, testset = load_data(data_dir)
  17. test_abs = int(len(trainset) * 0.8)
  18. train_subset, val_subset = random_split(
  19. trainset, [test_abs, len(trainset) - test_abs])
  20. trainloader = torch.utils.data.DataLoader(
  21. train_subset,
  22. batch_size=int(config["batch_size"]),
  23. shuffle=True,
  24. num_workers=8)
  25. valloader = torch.utils.data.DataLoader(
  26. val_subset,
  27. batch_size=int(config["batch_size"]),
  28. shuffle=True,
  29. num_workers=8)
  30. for epoch in range(10): # loop over the dataset multiple times
  31. running_loss = 0.0
  32. epoch_steps = 0
  33. for i, data in enumerate(trainloader, 0):
  34. # get the inputs; data is a list of [inputs, labels]
  35. inputs, labels = data
  36. inputs, labels = inputs.to(device), labels.to(device)
  37. # zero the parameter gradients
  38. optimizer.zero_grad()
  39. # forward + backward + optimize
  40. outputs = net(inputs)
  41. loss = criterion(outputs, labels)
  42. loss.backward()
  43. optimizer.step()
  44. # print statistics
  45. running_loss += loss.item()
  46. epoch_steps += 1
  47. if i % 2000 == 1999: # print every 2000 mini-batches
  48. print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,
  49. running_loss / epoch_steps))
  50. running_loss = 0.0
  51. # Validation loss
  52. val_loss = 0.0
  53. val_steps = 0
  54. total = 0
  55. correct = 0
  56. for i, data in enumerate(valloader, 0):
  57. with torch.no_grad():
  58. inputs, labels = data
  59. inputs, labels = inputs.to(device), labels.to(device)
  60. outputs = net(inputs)
  61. _, predicted = torch.max(outputs.data, 1)
  62. total += labels.size(0)
  63. correct += (predicted == labels).sum().item()
  64. loss = criterion(outputs, labels)
  65. val_loss += loss.cpu().numpy()
  66. val_steps += 1
  67. with tune.checkpoint_dir(epoch) as checkpoint_dir:
  68. path = os.path.join(checkpoint_dir, "checkpoint")
  69. torch.save((net.state_dict(), optimizer.state_dict()), path)
  70. tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
  71. print("Finished Training")

如您所见,大多数代码直接来自原始示例。

测试集准确率

通常,机器学习模型的表现是在保持测试集上使用尚未用于训练模型的数据进行测试的。 我们还将其包装在一个函数中:

  1. def test_accuracy(net, device="cpu"):
  2. trainset, testset = load_data()
  3. testloader = torch.utils.data.DataLoader(
  4. testset, batch_size=4, shuffle=False, num_workers=2)
  5. correct = 0
  6. total = 0
  7. with torch.no_grad():
  8. for data in testloader:
  9. images, labels = data
  10. images, labels = images.to(device), labels.to(device)
  11. outputs = net(images)
  12. _, predicted = torch.max(outputs.data, 1)
  13. total += labels.size(0)
  14. correct += (predicted == labels).sum().item()
  15. return correct / total

该函数还需要一个device参数,因此我们可以在 GPU 上进行测试集验证。

配置搜索空间

最后,我们需要定义 Ray Tune 的搜索空间。 这是一个例子:

  1. config = {
  2. "l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
  3. "l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
  4. "lr": tune.loguniform(1e-4, 1e-1),
  5. "batch_size": tune.choice([2, 4, 8, 16])
  6. }

tune.sample_from()函数使您可以定义自己的采样方法以获得超参数。 在此示例中,l1l2参数应为 4 到 256 之间的 2 的幂,因此应为 4、8、16、32、64、128 或 256。lr(学习率)应在 0.0001 和 0.1 之间均匀采样。 最后,批量大小可以在 2、4、8 和 16 之间选择。

现在,在每次试用中,Ray Tune 都会从这些搜索空间中随机抽取参数组合。 然后它将并行训练许多模型,并在其中找到表现最佳的模型。 我们还使用ASHAScheduler,它将尽早终止效果不佳的测试。

我们用functools.partial包装train_cifar函数以设置常量data_dir参数。 我们还可以告诉 Ray Tune 每个审判应提供哪些资源:

  1. gpus_per_trial = 2
  2. # ...
  3. result = tune.run(
  4. partial(train_cifar, data_dir=data_dir),
  5. resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},
  6. config=config,
  7. num_samples=num_samples,
  8. scheduler=scheduler,
  9. progress_reporter=reporter,
  10. checkpoint_at_end=True)

您可以指定 CPU 的数量,例如增加 PyTorch DataLoader实例的num_workers。 在每次试用中,选定数量的 GPU 对 PyTorch 都是可见的。 试用版无法访问未要求使用 GPU 的 GPU,因此您不必担心使用同一组资源进行两次试用。

在这里,我们还可以指定分数 GPU,因此gpus_per_trial=0.5之类的东西完全有效。 然后,试用版将彼此共享 GPU。 您只需要确保模型仍然适合 GPU 内存即可。

训练完模型后,我们将找到表现最好的模型,并从检查点文件中加载训练后的网络。 然后,我们获得测试仪的准确率,并通过打印报告一切。

完整的main函数如下:

  1. def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
  2. data_dir = os.path.abspath("./data")
  3. load_data(data_dir)
  4. config = {
  5. "l1": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),
  6. "l2": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),
  7. "lr": tune.loguniform(1e-4, 1e-1),
  8. "batch_size": tune.choice([2, 4, 8, 16])
  9. }
  10. scheduler = ASHAScheduler(
  11. metric="loss",
  12. mode="min",
  13. max_t=max_num_epochs,
  14. grace_period=1,
  15. reduction_factor=2)
  16. reporter = CLIReporter(
  17. # parameter_columns=["l1", "l2", "lr", "batch_size"],
  18. metric_columns=["loss", "accuracy", "training_iteration"])
  19. result = tune.run(
  20. partial(train_cifar, data_dir=data_dir),
  21. resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},
  22. config=config,
  23. num_samples=num_samples,
  24. scheduler=scheduler,
  25. progress_reporter=reporter)
  26. best_trial = result.get_best_trial("loss", "min", "last")
  27. print("Best trial config: {}".format(best_trial.config))
  28. print("Best trial final validation loss: {}".format(
  29. best_trial.last_result["loss"]))
  30. print("Best trial final validation accuracy: {}".format(
  31. best_trial.last_result["accuracy"]))
  32. best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])
  33. device = "cpu"
  34. if torch.cuda.is_available():
  35. device = "cuda:0"
  36. if gpus_per_trial > 1:
  37. best_trained_model = nn.DataParallel(best_trained_model)
  38. best_trained_model.to(device)
  39. best_checkpoint_dir = best_trial.checkpoint.value
  40. model_state, optimizer_state = torch.load(os.path.join(
  41. best_checkpoint_dir, "checkpoint"))
  42. best_trained_model.load_state_dict(model_state)
  43. test_acc = test_accuracy(best_trained_model, device)
  44. print("Best trial test set accuracy: {}".format(test_acc))
  45. if __name__ == "__main__":
  46. # You can change the number of GPUs per trial here:
  47. main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)

出:

  1. Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz
  2. Extracting /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data
  3. Files already downloaded and verified
  4. == Status ==
  5. Memory usage on this node: 4.0/240.1 GiB
  6. Using AsyncHyperBand: num_stopped=0
  7. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None
  8. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  9. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  10. Number of trials: 1/10 (1 RUNNING)
  11. +---------------------+----------+-------+--------------+------+------+-------------+
  12. | Trial name | status | loc | batch_size | l1 | l2 | lr |
  13. |---------------------+----------+-------+--------------+------+------+-------------|
  14. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 |
  15. +---------------------+----------+-------+--------------+------+------+-------------+
  16. [2m[36m(pid=1588)[0m Files already downloaded and verified
  17. [2m[36m(pid=1568)[0m Files already downloaded and verified
  18. [2m[36m(pid=1504)[0m Files already downloaded and verified
  19. [2m[36m(pid=1575)[0m Files already downloaded and verified
  20. [2m[36m(pid=1494)[0m Files already downloaded and verified
  21. [2m[36m(pid=1572)[0m Files already downloaded and verified
  22. [2m[36m(pid=1567)[0m Files already downloaded and verified
  23. [2m[36m(pid=1585)[0m Files already downloaded and verified
  24. [2m[36m(pid=1565)[0m Files already downloaded and verified
  25. [2m[36m(pid=1505)[0m Files already downloaded and verified
  26. [2m[36m(pid=1588)[0m Files already downloaded and verified
  27. [2m[36m(pid=1568)[0m Files already downloaded and verified
  28. [2m[36m(pid=1504)[0m Files already downloaded and verified
  29. [2m[36m(pid=1575)[0m Files already downloaded and verified
  30. [2m[36m(pid=1494)[0m Files already downloaded and verified
  31. [2m[36m(pid=1572)[0m Files already downloaded and verified
  32. [2m[36m(pid=1567)[0m Files already downloaded and verified
  33. [2m[36m(pid=1565)[0m Files already downloaded and verified
  34. [2m[36m(pid=1585)[0m Files already downloaded and verified
  35. [2m[36m(pid=1505)[0m Files already downloaded and verified
  36. [2m[36m(pid=1585)[0m [1, 2000] loss: 2.307
  37. [2m[36m(pid=1568)[0m [1, 2000] loss: 2.226
  38. [2m[36m(pid=1565)[0m [1, 2000] loss: 2.141
  39. [2m[36m(pid=1505)[0m [1, 2000] loss: 2.339
  40. [2m[36m(pid=1504)[0m [1, 2000] loss: 2.042
  41. [2m[36m(pid=1572)[0m [1, 2000] loss: 2.288
  42. [2m[36m(pid=1567)[0m [1, 2000] loss: 2.047
  43. [2m[36m(pid=1575)[0m [1, 2000] loss: 2.316
  44. [2m[36m(pid=1494)[0m [1, 2000] loss: 2.322
  45. [2m[36m(pid=1588)[0m [1, 2000] loss: 2.289
  46. [2m[36m(pid=1585)[0m [1, 4000] loss: 1.154
  47. [2m[36m(pid=1505)[0m [1, 4000] loss: 1.170
  48. [2m[36m(pid=1565)[0m [1, 4000] loss: 0.939
  49. [2m[36m(pid=1568)[0m [1, 4000] loss: 1.102
  50. [2m[36m(pid=1504)[0m [1, 4000] loss: 0.916
  51. [2m[36m(pid=1572)[0m [1, 4000] loss: 1.156
  52. Result for DEFAULT_d3304_00003:
  53. accuracy: 0.226
  54. date: 2021-01-05_20-23-37
  55. done: false
  56. experiment_id: d4b00469893d498ea65a729df202882a
  57. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  58. hostname: 1a844a452371
  59. iterations_since_restore: 1
  60. loss: 2.083958268547058
  61. node_ip: 172.17.0.2
  62. pid: 1588
  63. should_checkpoint: true
  64. time_since_restore: 27.169169902801514
  65. time_this_iter_s: 27.169169902801514
  66. time_total_s: 27.169169902801514
  67. timestamp: 1609878217
  68. timesteps_since_restore: 0
  69. training_iteration: 1
  70. trial_id: d3304_00003
  71. == Status ==
  72. Memory usage on this node: 9.2/240.1 GiB
  73. Using AsyncHyperBand: num_stopped=0
  74. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -2.083958268547058
  75. Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  76. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  77. Number of trials: 10/10 (10 RUNNING)
  78. +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  79. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  80. |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  81. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  82. | DEFAULT_d3304_00001 | RUNNING | | 8 | 16 | 32 | 0.077467 | | | |
  83. | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | |
  84. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 2.08396 | 0.226 | 1 |
  85. | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | |
  86. | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | |
  87. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  88. | DEFAULT_d3304_00007 | RUNNING | | 8 | 8 | 8 | 0.000155891 | | | |
  89. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  90. | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | |
  91. +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  92. [2m[36m(pid=1567)[0m [1, 4000] loss: 0.943
  93. [2m[36m(pid=1494)[0m [1, 4000] loss: 1.155
  94. [2m[36m(pid=1575)[0m [1, 4000] loss: 1.162
  95. [2m[36m(pid=1585)[0m [1, 6000] loss: 0.768
  96. [2m[36m(pid=1505)[0m [1, 6000] loss: 0.780
  97. [2m[36m(pid=1565)[0m [1, 6000] loss: 0.582
  98. [2m[36m(pid=1504)[0m [1, 6000] loss: 0.587
  99. [2m[36m(pid=1568)[0m [1, 6000] loss: 0.770
  100. [2m[36m(pid=1572)[0m [1, 6000] loss: 0.771
  101. [2m[36m(pid=1567)[0m [1, 6000] loss: 0.615
  102. Result for DEFAULT_d3304_00007:
  103. accuracy: 0.1011
  104. date: 2021-01-05_20-23-51
  105. done: true
  106. experiment_id: 947614a8c2a74533be128b929f363bd1
  107. experiment_tag: 7_batch_size=8,l1=8,l2=8,lr=0.00015589
  108. hostname: 1a844a452371
  109. iterations_since_restore: 1
  110. loss: 2.3038805620193483
  111. node_ip: 172.17.0.2
  112. pid: 1494
  113. should_checkpoint: true
  114. time_since_restore: 41.69914960861206
  115. time_this_iter_s: 41.69914960861206
  116. time_total_s: 41.69914960861206
  117. timestamp: 1609878231
  118. timesteps_since_restore: 0
  119. training_iteration: 1
  120. trial_id: d3304_00007
  121. == Status ==
  122. Memory usage on this node: 9.1/240.1 GiB
  123. Using AsyncHyperBand: num_stopped=1
  124. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -2.193919415283203
  125. Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  126. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  127. Number of trials: 10/10 (10 RUNNING)
  128. +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  129. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  130. |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  131. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  132. | DEFAULT_d3304_00001 | RUNNING | | 8 | 16 | 32 | 0.077467 | | | |
  133. | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | |
  134. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 2.08396 | 0.226 | 1 |
  135. | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | |
  136. | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | |
  137. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  138. | DEFAULT_d3304_00007 | RUNNING | 172.17.0.2:1494 | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  139. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  140. | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | |
  141. +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  142. Result for DEFAULT_d3304_00001:
  143. accuracy: 0.1017
  144. date: 2021-01-05_20-23-51
  145. done: true
  146. experiment_id: 26ac228b4b454584869f8490742cf253
  147. experiment_tag: 1_batch_size=8,l1=16,l2=32,lr=0.077467
  148. hostname: 1a844a452371
  149. iterations_since_restore: 1
  150. loss: 2.321864831352234
  151. node_ip: 172.17.0.2
  152. pid: 1575
  153. should_checkpoint: true
  154. time_since_restore: 42.09821367263794
  155. time_this_iter_s: 42.09821367263794
  156. time_total_s: 42.09821367263794
  157. timestamp: 1609878231
  158. timesteps_since_restore: 0
  159. training_iteration: 1
  160. trial_id: d3304_00001
  161. [2m[36m(pid=1588)[0m [2, 2000] loss: 1.916
  162. [2m[36m(pid=1585)[0m [1, 8000] loss: 0.576
  163. [2m[36m(pid=1505)[0m [1, 8000] loss: 0.584
  164. [2m[36m(pid=1565)[0m [1, 8000] loss: 0.422
  165. [2m[36m(pid=1504)[0m [1, 8000] loss: 0.433
  166. [2m[36m(pid=1572)[0m [1, 8000] loss: 0.578
  167. [2m[36m(pid=1568)[0m [1, 8000] loss: 0.580
  168. Result for DEFAULT_d3304_00003:
  169. accuracy: 0.3762
  170. date: 2021-01-05_20-24-00
  171. done: false
  172. experiment_id: d4b00469893d498ea65a729df202882a
  173. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  174. hostname: 1a844a452371
  175. iterations_since_restore: 2
  176. loss: 1.7041921138763427
  177. node_ip: 172.17.0.2
  178. pid: 1588
  179. should_checkpoint: true
  180. time_since_restore: 50.74612545967102
  181. time_this_iter_s: 23.576955556869507
  182. time_total_s: 50.74612545967102
  183. timestamp: 1609878240
  184. timesteps_since_restore: 0
  185. training_iteration: 2
  186. trial_id: d3304_00003
  187. == Status ==
  188. Memory usage on this node: 8.0/240.1 GiB
  189. Using AsyncHyperBand: num_stopped=2
  190. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483
  191. Resources requested: 16/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  192. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  193. Number of trials: 10/10 (8 RUNNING, 2 TERMINATED)
  194. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  195. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  196. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  197. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  198. | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | |
  199. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.70419 | 0.3762 | 2 |
  200. | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | |
  201. | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | |
  202. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  203. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  204. | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | |
  205. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  206. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  207. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  208. [2m[36m(pid=1567)[0m [1, 8000] loss: 0.458
  209. [2m[36m(pid=1585)[0m [1, 10000] loss: 0.461
  210. [2m[36m(pid=1505)[0m [1, 10000] loss: 0.467
  211. [2m[36m(pid=1565)[0m [1, 10000] loss: 0.329
  212. [2m[36m(pid=1504)[0m [1, 10000] loss: 0.344
  213. [2m[36m(pid=1572)[0m [1, 10000] loss: 0.463
  214. [2m[36m(pid=1568)[0m [1, 10000] loss: 0.464
  215. [2m[36m(pid=1567)[0m [1, 10000] loss: 0.360
  216. [2m[36m(pid=1588)[0m [3, 2000] loss: 1.663
  217. Result for DEFAULT_d3304_00002:
  218. accuracy: 0.3791
  219. date: 2021-01-05_20-24-18
  220. done: false
  221. experiment_id: eaf4d25c9a0e46219afb226ed323095b
  222. experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699
  223. hostname: 1a844a452371
  224. iterations_since_restore: 1
  225. loss: 1.6690538251161575
  226. node_ip: 172.17.0.2
  227. pid: 1504
  228. should_checkpoint: true
  229. time_since_restore: 68.1856791973114
  230. time_this_iter_s: 68.1856791973114
  231. time_total_s: 68.1856791973114
  232. timestamp: 1609878258
  233. timesteps_since_restore: 0
  234. training_iteration: 1
  235. trial_id: d3304_00002
  236. == Status ==
  237. Memory usage on this node: 8.0/240.1 GiB
  238. Using AsyncHyperBand: num_stopped=2
  239. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.193919415283203
  240. Resources requested: 16/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  241. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  242. Number of trials: 10/10 (8 RUNNING, 2 TERMINATED)
  243. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  244. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  245. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  246. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  247. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 |
  248. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.70419 | 0.3762 | 2 |
  249. | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | |
  250. | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | |
  251. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  252. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  253. | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | |
  254. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  255. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  256. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  257. [2m[36m(pid=1585)[0m [1, 12000] loss: 0.384
  258. [2m[36m(pid=1505)[0m [1, 12000] loss: 0.390
  259. Result for DEFAULT_d3304_00009:
  260. accuracy: 0.101
  261. date: 2021-01-05_20-24-19
  262. done: true
  263. experiment_id: 471eb6134c2a45509b005af46861c602
  264. experiment_tag: 9_batch_size=4,l1=4,l2=32,lr=0.017524
  265. hostname: 1a844a452371
  266. iterations_since_restore: 1
  267. loss: 2.310983589553833
  268. node_ip: 172.17.0.2
  269. pid: 1572
  270. should_checkpoint: true
  271. time_since_restore: 69.29919123649597
  272. time_this_iter_s: 69.29919123649597
  273. time_total_s: 69.29919123649597
  274. timestamp: 1609878259
  275. timesteps_since_restore: 0
  276. training_iteration: 1
  277. trial_id: d3304_00009
  278. Result for DEFAULT_d3304_00004:
  279. accuracy: 0.102
  280. date: 2021-01-05_20-24-19
  281. done: true
  282. experiment_id: bd1f438c1fdd4a9ba98074d1cfd573fe
  283. experiment_tag: 4_batch_size=4,l1=16,l2=32,lr=0.016474
  284. hostname: 1a844a452371
  285. iterations_since_restore: 1
  286. loss: 2.313420217037201
  287. node_ip: 172.17.0.2
  288. pid: 1568
  289. should_checkpoint: true
  290. time_since_restore: 69.48366618156433
  291. time_this_iter_s: 69.48366618156433
  292. time_total_s: 69.48366618156433
  293. timestamp: 1609878259
  294. timesteps_since_restore: 0
  295. training_iteration: 1
  296. trial_id: d3304_00004
  297. [2m[36m(pid=1565)[0m [1, 12000] loss: 0.267
  298. Result for DEFAULT_d3304_00005:
  299. accuracy: 0.3301
  300. date: 2021-01-05_20-24-22
  301. done: false
  302. experiment_id: 738b3d315db548a7956646b2c07f1b0c
  303. experiment_tag: 5_batch_size=4,l1=128,l2=64,lr=0.0075725
  304. hostname: 1a844a452371
  305. iterations_since_restore: 1
  306. loss: 1.8058318739891053
  307. node_ip: 172.17.0.2
  308. pid: 1567
  309. should_checkpoint: true
  310. time_since_restore: 72.0806794166565
  311. time_this_iter_s: 72.0806794166565
  312. time_total_s: 72.0806794166565
  313. timestamp: 1609878262
  314. timesteps_since_restore: 0
  315. training_iteration: 1
  316. trial_id: d3304_00005
  317. Result for DEFAULT_d3304_00003:
  318. accuracy: 0.4242
  319. date: 2021-01-05_20-24-23
  320. done: false
  321. experiment_id: d4b00469893d498ea65a729df202882a
  322. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  323. hostname: 1a844a452371
  324. iterations_since_restore: 3
  325. loss: 1.5498835063934326
  326. node_ip: 172.17.0.2
  327. pid: 1588
  328. should_checkpoint: true
  329. time_since_restore: 73.29849410057068
  330. time_this_iter_s: 22.552368640899658
  331. time_total_s: 73.29849410057068
  332. timestamp: 1609878263
  333. timesteps_since_restore: 0
  334. training_iteration: 3
  335. trial_id: d3304_00003
  336. == Status ==
  337. Memory usage on this node: 6.9/240.1 GiB
  338. Using AsyncHyperBand: num_stopped=4
  339. Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483
  340. Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  341. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  342. Number of trials: 10/10 (6 RUNNING, 4 TERMINATED)
  343. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  344. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  345. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  346. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  347. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 |
  348. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.54988 | 0.4242 | 3 |
  349. | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 |
  350. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  351. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  352. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  353. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  354. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  355. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  356. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  357. [2m[36m(pid=1585)[0m [1, 14000] loss: 0.329
  358. [2m[36m(pid=1504)[0m [2, 2000] loss: 1.708
  359. [2m[36m(pid=1565)[0m [1, 14000] loss: 0.225
  360. [2m[36m(pid=1505)[0m [1, 14000] loss: 0.334
  361. [2m[36m(pid=1567)[0m [2, 2000] loss: 1.803
  362. [2m[36m(pid=1585)[0m [1, 16000] loss: 0.288
  363. [2m[36m(pid=1588)[0m [4, 2000] loss: 1.541
  364. [2m[36m(pid=1504)[0m [2, 4000] loss: 0.840
  365. [2m[36m(pid=1565)[0m [1, 16000] loss: 0.198
  366. [2m[36m(pid=1505)[0m [1, 16000] loss: 0.292
  367. [2m[36m(pid=1567)[0m [2, 4000] loss: 0.912
  368. Result for DEFAULT_d3304_00003:
  369. accuracy: 0.4494
  370. date: 2021-01-05_20-24-44
  371. done: false
  372. experiment_id: d4b00469893d498ea65a729df202882a
  373. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  374. hostname: 1a844a452371
  375. iterations_since_restore: 4
  376. loss: 1.4720179980278014
  377. node_ip: 172.17.0.2
  378. pid: 1588
  379. should_checkpoint: true
  380. time_since_restore: 94.81268787384033
  381. time_this_iter_s: 21.514193773269653
  382. time_total_s: 94.81268787384033
  383. timestamp: 1609878284
  384. timesteps_since_restore: 0
  385. training_iteration: 4
  386. trial_id: d3304_00003
  387. == Status ==
  388. Memory usage on this node: 6.9/240.1 GiB
  389. Using AsyncHyperBand: num_stopped=4
  390. Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483
  391. Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  392. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  393. Number of trials: 10/10 (6 RUNNING, 4 TERMINATED)
  394. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  395. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  396. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  397. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  398. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 |
  399. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.47202 | 0.4494 | 4 |
  400. | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 |
  401. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  402. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  403. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  404. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  405. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  406. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  407. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  408. [2m[36m(pid=1585)[0m [1, 18000] loss: 0.256
  409. [2m[36m(pid=1565)[0m [1, 18000] loss: 0.173
  410. [2m[36m(pid=1504)[0m [2, 6000] loss: 0.572
  411. [2m[36m(pid=1505)[0m [1, 18000] loss: 0.259
  412. [2m[36m(pid=1567)[0m [2, 6000] loss: 0.611
  413. [2m[36m(pid=1585)[0m [1, 20000] loss: 0.230
  414. [2m[36m(pid=1565)[0m [1, 20000] loss: 0.156
  415. [2m[36m(pid=1505)[0m [1, 20000] loss: 0.234
  416. [2m[36m(pid=1504)[0m [2, 8000] loss: 0.417
  417. [2m[36m(pid=1588)[0m [5, 2000] loss: 1.452
  418. [2m[36m(pid=1567)[0m [2, 8000] loss: 0.461
  419. Result for DEFAULT_d3304_00003:
  420. accuracy: 0.4839
  421. date: 2021-01-05_20-25-06
  422. done: false
  423. experiment_id: d4b00469893d498ea65a729df202882a
  424. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  425. hostname: 1a844a452371
  426. iterations_since_restore: 5
  427. loss: 1.4083827662467956
  428. node_ip: 172.17.0.2
  429. pid: 1588
  430. should_checkpoint: true
  431. time_since_restore: 116.5817449092865
  432. time_this_iter_s: 21.769057035446167
  433. time_total_s: 116.5817449092865
  434. timestamp: 1609878306
  435. timesteps_since_restore: 0
  436. training_iteration: 5
  437. trial_id: d3304_00003
  438. == Status ==
  439. Memory usage on this node: 6.9/240.1 GiB
  440. Using AsyncHyperBand: num_stopped=4
  441. Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483
  442. Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  443. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  444. Number of trials: 10/10 (6 RUNNING, 4 TERMINATED)
  445. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  446. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  447. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  448. | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | |
  449. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 |
  450. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 |
  451. | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 |
  452. | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | |
  453. | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | |
  454. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  455. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  456. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  457. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  458. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  459. [2m[36m(pid=1504)[0m [2, 10000] loss: 0.339
  460. Result for DEFAULT_d3304_00000:
  461. accuracy: 0.1104
  462. date: 2021-01-05_20-25-10
  463. done: false
  464. experiment_id: 454624d453954d46b33a1eb496e7ec53
  465. experiment_tag: 0_batch_size=2,l1=4,l2=16,lr=0.00011192
  466. hostname: 1a844a452371
  467. iterations_since_restore: 1
  468. loss: 2.2988875378131866
  469. node_ip: 172.17.0.2
  470. pid: 1585
  471. should_checkpoint: true
  472. time_since_restore: 120.59520411491394
  473. time_this_iter_s: 120.59520411491394
  474. time_total_s: 120.59520411491394
  475. timestamp: 1609878310
  476. timesteps_since_restore: 0
  477. training_iteration: 1
  478. trial_id: d3304_00000
  479. Result for DEFAULT_d3304_00008:
  480. accuracy: 0.0983
  481. date: 2021-01-05_20-25-11
  482. done: true
  483. experiment_id: 381603b190bc47a9b794321f7692695f
  484. experiment_tag: 8_batch_size=2,l1=16,l2=64,lr=0.03102
  485. hostname: 1a844a452371
  486. iterations_since_restore: 1
  487. loss: 2.336980807876587
  488. node_ip: 172.17.0.2
  489. pid: 1505
  490. should_checkpoint: true
  491. time_since_restore: 121.36707901954651
  492. time_this_iter_s: 121.36707901954651
  493. time_total_s: 121.36707901954651
  494. timestamp: 1609878311
  495. timesteps_since_restore: 0
  496. training_iteration: 1
  497. trial_id: d3304_00008
  498. Result for DEFAULT_d3304_00006:
  499. accuracy: 0.4586
  500. date: 2021-01-05_20-25-11
  501. done: false
  502. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  503. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  504. hostname: 1a844a452371
  505. iterations_since_restore: 1
  506. loss: 1.5124113649010658
  507. node_ip: 172.17.0.2
  508. pid: 1565
  509. should_checkpoint: true
  510. time_since_restore: 121.536208152771
  511. time_this_iter_s: 121.536208152771
  512. time_total_s: 121.536208152771
  513. timestamp: 1609878311
  514. timesteps_since_restore: 0
  515. training_iteration: 1
  516. trial_id: d3304_00006
  517. == Status ==
  518. Memory usage on this node: 6.6/240.1 GiB
  519. Using AsyncHyperBand: num_stopped=5
  520. Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  521. Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  522. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  523. Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
  524. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  525. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  526. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  527. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  528. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 |
  529. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 |
  530. | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 |
  531. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  532. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  533. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  534. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  535. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  536. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  537. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  538. Result for DEFAULT_d3304_00002:
  539. accuracy: 0.4078
  540. date: 2021-01-05_20-25-16
  541. done: false
  542. experiment_id: eaf4d25c9a0e46219afb226ed323095b
  543. experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699
  544. hostname: 1a844a452371
  545. iterations_since_restore: 2
  546. loss: 1.6191314194440842
  547. node_ip: 172.17.0.2
  548. pid: 1504
  549. should_checkpoint: true
  550. time_since_restore: 126.61185264587402
  551. time_this_iter_s: 58.42617344856262
  552. time_total_s: 126.61185264587402
  553. timestamp: 1609878316
  554. timesteps_since_restore: 0
  555. training_iteration: 2
  556. trial_id: d3304_00002
  557. [2m[36m(pid=1567)[0m [2, 10000] loss: 0.371
  558. [2m[36m(pid=1585)[0m [2, 2000] loss: 2.298
  559. [2m[36m(pid=1565)[0m [2, 2000] loss: 1.466
  560. [2m[36m(pid=1588)[0m [6, 2000] loss: 1.383
  561. Result for DEFAULT_d3304_00005:
  562. accuracy: 0.3647
  563. date: 2021-01-05_20-25-24
  564. done: true
  565. experiment_id: 738b3d315db548a7956646b2c07f1b0c
  566. experiment_tag: 5_batch_size=4,l1=128,l2=64,lr=0.0075725
  567. hostname: 1a844a452371
  568. iterations_since_restore: 2
  569. loss: 1.7739140236496926
  570. node_ip: 172.17.0.2
  571. pid: 1567
  572. should_checkpoint: true
  573. time_since_restore: 134.1462869644165
  574. time_this_iter_s: 62.06560754776001
  575. time_total_s: 134.1462869644165
  576. timestamp: 1609878324
  577. timesteps_since_restore: 0
  578. training_iteration: 2
  579. trial_id: d3304_00005
  580. == Status ==
  581. Memory usage on this node: 6.3/240.1 GiB
  582. Using AsyncHyperBand: num_stopped=6
  583. Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  584. Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  585. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  586. Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
  587. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  588. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  589. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  590. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  591. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 |
  592. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 |
  593. | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  594. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  595. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  596. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  597. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  598. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  599. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  600. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  601. [2m[36m(pid=1504)[0m [3, 2000] loss: 1.656
  602. Result for DEFAULT_d3304_00003:
  603. accuracy: 0.5061
  604. date: 2021-01-05_20-25-27
  605. done: false
  606. experiment_id: d4b00469893d498ea65a729df202882a
  607. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  608. hostname: 1a844a452371
  609. iterations_since_restore: 6
  610. loss: 1.3623717227935792
  611. node_ip: 172.17.0.2
  612. pid: 1588
  613. should_checkpoint: true
  614. time_since_restore: 137.95851016044617
  615. time_this_iter_s: 21.376765251159668
  616. time_total_s: 137.95851016044617
  617. timestamp: 1609878327
  618. timesteps_since_restore: 0
  619. training_iteration: 6
  620. trial_id: d3304_00003
  621. [2m[36m(pid=1585)[0m [2, 4000] loss: 1.147
  622. [2m[36m(pid=1565)[0m [2, 4000] loss: 0.749
  623. [2m[36m(pid=1504)[0m [3, 4000] loss: 0.838
  624. [2m[36m(pid=1585)[0m [2, 6000] loss: 0.760
  625. [2m[36m(pid=1565)[0m [2, 6000] loss: 0.498
  626. [2m[36m(pid=1588)[0m [7, 2000] loss: 1.326
  627. [2m[36m(pid=1504)[0m [3, 6000] loss: 0.560
  628. [2m[36m(pid=1585)[0m [2, 8000] loss: 0.561
  629. Result for DEFAULT_d3304_00003:
  630. accuracy: 0.5209
  631. date: 2021-01-05_20-25-48
  632. done: false
  633. experiment_id: d4b00469893d498ea65a729df202882a
  634. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  635. hostname: 1a844a452371
  636. iterations_since_restore: 7
  637. loss: 1.316757419013977
  638. node_ip: 172.17.0.2
  639. pid: 1588
  640. should_checkpoint: true
  641. time_since_restore: 158.4953932762146
  642. time_this_iter_s: 20.536883115768433
  643. time_total_s: 158.4953932762146
  644. timestamp: 1609878348
  645. timesteps_since_restore: 0
  646. training_iteration: 7
  647. trial_id: d3304_00003
  648. == Status ==
  649. Memory usage on this node: 5.8/240.1 GiB
  650. Using AsyncHyperBand: num_stopped=6
  651. Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  652. Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  653. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  654. Number of trials: 10/10 (4 RUNNING, 6 TERMINATED)
  655. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  656. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  657. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  658. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  659. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 |
  660. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.31676 | 0.5209 | 7 |
  661. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  662. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  663. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  664. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  665. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  666. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  667. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  668. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  669. [2m[36m(pid=1565)[0m [2, 8000] loss: 0.372
  670. [2m[36m(pid=1504)[0m [3, 8000] loss: 0.416
  671. [2m[36m(pid=1585)[0m [2, 10000] loss: 0.434
  672. [2m[36m(pid=1565)[0m [2, 10000] loss: 0.292
  673. [2m[36m(pid=1588)[0m [8, 2000] loss: 1.278
  674. [2m[36m(pid=1504)[0m [3, 10000] loss: 0.333
  675. [2m[36m(pid=1585)[0m [2, 12000] loss: 0.347
  676. [2m[36m(pid=1565)[0m [2, 12000] loss: 0.245
  677. Result for DEFAULT_d3304_00003:
  678. accuracy: 0.5406
  679. date: 2021-01-05_20-26-08
  680. done: false
  681. experiment_id: d4b00469893d498ea65a729df202882a
  682. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  683. hostname: 1a844a452371
  684. iterations_since_restore: 8
  685. loss: 1.267511115884781
  686. node_ip: 172.17.0.2
  687. pid: 1588
  688. should_checkpoint: true
  689. time_since_restore: 179.13841199874878
  690. time_this_iter_s: 20.64301872253418
  691. time_total_s: 179.13841199874878
  692. timestamp: 1609878368
  693. timesteps_since_restore: 0
  694. training_iteration: 8
  695. trial_id: d3304_00003
  696. == Status ==
  697. Memory usage on this node: 5.8/240.1 GiB
  698. Using AsyncHyperBand: num_stopped=6
  699. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  700. Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  701. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  702. Number of trials: 10/10 (4 RUNNING, 6 TERMINATED)
  703. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  704. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  705. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  706. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  707. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 |
  708. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.26751 | 0.5406 | 8 |
  709. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  710. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  711. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  712. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  713. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  714. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  715. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  716. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  717. Result for DEFAULT_d3304_00002:
  718. accuracy: 0.3997
  719. date: 2021-01-05_20-26-11
  720. done: false
  721. experiment_id: eaf4d25c9a0e46219afb226ed323095b
  722. experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699
  723. hostname: 1a844a452371
  724. iterations_since_restore: 3
  725. loss: 1.7084122330278158
  726. node_ip: 172.17.0.2
  727. pid: 1504
  728. should_checkpoint: true
  729. time_since_restore: 182.02509140968323
  730. time_this_iter_s: 55.413238763809204
  731. time_total_s: 182.02509140968323
  732. timestamp: 1609878371
  733. timesteps_since_restore: 0
  734. training_iteration: 3
  735. trial_id: d3304_00002
  736. [2m[36m(pid=1585)[0m [2, 14000] loss: 0.290
  737. [2m[36m(pid=1565)[0m [2, 14000] loss: 0.213
  738. [2m[36m(pid=1504)[0m [4, 2000] loss: 1.653
  739. [2m[36m(pid=1588)[0m [9, 2000] loss: 1.245
  740. [2m[36m(pid=1585)[0m [2, 16000] loss: 0.244
  741. [2m[36m(pid=1565)[0m [2, 16000] loss: 0.186
  742. Result for DEFAULT_d3304_00003:
  743. accuracy: 0.5409
  744. date: 2021-01-05_20-26-29
  745. done: false
  746. experiment_id: d4b00469893d498ea65a729df202882a
  747. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  748. hostname: 1a844a452371
  749. iterations_since_restore: 9
  750. loss: 1.2721123942375183
  751. node_ip: 172.17.0.2
  752. pid: 1588
  753. should_checkpoint: true
  754. time_since_restore: 199.56540870666504
  755. time_this_iter_s: 20.42699670791626
  756. time_total_s: 199.56540870666504
  757. timestamp: 1609878389
  758. timesteps_since_restore: 0
  759. training_iteration: 9
  760. trial_id: d3304_00003
  761. == Status ==
  762. Memory usage on this node: 5.8/240.1 GiB
  763. Using AsyncHyperBand: num_stopped=6
  764. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  765. Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  766. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  767. Number of trials: 10/10 (4 RUNNING, 6 TERMINATED)
  768. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  769. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  770. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  771. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  772. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 |
  773. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.27211 | 0.5409 | 9 |
  774. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  775. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  776. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  777. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  778. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  779. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  780. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  781. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  782. [2m[36m(pid=1504)[0m [4, 4000] loss: 0.842
  783. [2m[36m(pid=1585)[0m [2, 18000] loss: 0.214
  784. [2m[36m(pid=1565)[0m [2, 18000] loss: 0.159
  785. [2m[36m(pid=1504)[0m [4, 6000] loss: 0.561
  786. [2m[36m(pid=1585)[0m [2, 20000] loss: 0.191
  787. [2m[36m(pid=1588)[0m [10, 2000] loss: 1.210
  788. [2m[36m(pid=1565)[0m [2, 20000] loss: 0.143
  789. Result for DEFAULT_d3304_00003:
  790. accuracy: 0.5619
  791. date: 2021-01-05_20-26-50
  792. done: true
  793. experiment_id: d4b00469893d498ea65a729df202882a
  794. experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023
  795. hostname: 1a844a452371
  796. iterations_since_restore: 10
  797. loss: 1.2222298237800597
  798. node_ip: 172.17.0.2
  799. pid: 1588
  800. should_checkpoint: true
  801. time_since_restore: 220.31984639167786
  802. time_this_iter_s: 20.754437685012817
  803. time_total_s: 220.31984639167786
  804. timestamp: 1609878410
  805. timesteps_since_restore: 0
  806. training_iteration: 10
  807. trial_id: d3304_00003
  808. == Status ==
  809. Memory usage on this node: 5.8/240.1 GiB
  810. Using AsyncHyperBand: num_stopped=7
  811. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  812. Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  813. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  814. Number of trials: 10/10 (4 RUNNING, 6 TERMINATED)
  815. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  816. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  817. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  818. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 |
  819. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 |
  820. | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  821. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  822. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  823. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  824. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  825. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  826. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  827. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  828. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  829. [2m[36m(pid=1504)[0m [4, 8000] loss: 0.422
  830. Result for DEFAULT_d3304_00000:
  831. accuracy: 0.2724
  832. date: 2021-01-05_20-26-55
  833. done: true
  834. experiment_id: 454624d453954d46b33a1eb496e7ec53
  835. experiment_tag: 0_batch_size=2,l1=4,l2=16,lr=0.00011192
  836. hostname: 1a844a452371
  837. iterations_since_restore: 2
  838. loss: 1.8605026947617531
  839. node_ip: 172.17.0.2
  840. pid: 1585
  841. should_checkpoint: true
  842. time_since_restore: 225.84529209136963
  843. time_this_iter_s: 105.25008797645569
  844. time_total_s: 225.84529209136963
  845. timestamp: 1609878415
  846. timesteps_since_restore: 0
  847. training_iteration: 2
  848. trial_id: d3304_00000
  849. == Status ==
  850. Memory usage on this node: 5.3/240.1 GiB
  851. Using AsyncHyperBand: num_stopped=8
  852. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7390530687630177 | Iter 1.000: -2.301384049916267
  853. Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  854. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  855. Number of trials: 10/10 (3 RUNNING, 7 TERMINATED)
  856. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  857. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  858. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  859. | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  860. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 |
  861. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 |
  862. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  863. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  864. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  865. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  866. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  867. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  868. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  869. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  870. Result for DEFAULT_d3304_00006:
  871. accuracy: 0.5007
  872. date: 2021-01-05_20-26-57
  873. done: false
  874. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  875. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  876. hostname: 1a844a452371
  877. iterations_since_restore: 2
  878. loss: 1.3979384284215048
  879. node_ip: 172.17.0.2
  880. pid: 1565
  881. should_checkpoint: true
  882. time_since_restore: 227.80454421043396
  883. time_this_iter_s: 106.26833605766296
  884. time_total_s: 227.80454421043396
  885. timestamp: 1609878417
  886. timesteps_since_restore: 0
  887. training_iteration: 2
  888. trial_id: d3304_00006
  889. [2m[36m(pid=1504)[0m [4, 10000] loss: 0.335
  890. Result for DEFAULT_d3304_00002:
  891. accuracy: 0.3849
  892. date: 2021-01-05_20-27-06
  893. done: true
  894. experiment_id: eaf4d25c9a0e46219afb226ed323095b
  895. experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699
  896. hostname: 1a844a452371
  897. iterations_since_restore: 4
  898. loss: 1.720731588792801
  899. node_ip: 172.17.0.2
  900. pid: 1504
  901. should_checkpoint: true
  902. time_since_restore: 236.71593952178955
  903. time_this_iter_s: 54.69084811210632
  904. time_total_s: 236.71593952178955
  905. timestamp: 1609878426
  906. timesteps_since_restore: 0
  907. training_iteration: 4
  908. trial_id: d3304_00002
  909. == Status ==
  910. Memory usage on this node: 4.7/240.1 GiB
  911. Using AsyncHyperBand: num_stopped=9
  912. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.5963747934103012 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  913. Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  914. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  915. Number of trials: 10/10 (2 RUNNING, 8 TERMINATED)
  916. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  917. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  918. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  919. | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  920. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.39794 | 0.5007 | 2 |
  921. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  922. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  923. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  924. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  925. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  926. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  927. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  928. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  929. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  930. [2m[36m(pid=1565)[0m [3, 2000] loss: 1.373
  931. [2m[36m(pid=1565)[0m [3, 4000] loss: 0.696
  932. [2m[36m(pid=1565)[0m [3, 6000] loss: 0.466
  933. [2m[36m(pid=1565)[0m [3, 8000] loss: 0.357
  934. [2m[36m(pid=1565)[0m [3, 10000] loss: 0.283
  935. [2m[36m(pid=1565)[0m [3, 12000] loss: 0.241
  936. [2m[36m(pid=1565)[0m [3, 14000] loss: 0.203
  937. [2m[36m(pid=1565)[0m [3, 16000] loss: 0.178
  938. [2m[36m(pid=1565)[0m [3, 18000] loss: 0.160
  939. [2m[36m(pid=1565)[0m [3, 20000] loss: 0.142
  940. Result for DEFAULT_d3304_00006:
  941. accuracy: 0.5095
  942. date: 2021-01-05_20-28-36
  943. done: false
  944. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  945. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  946. hostname: 1a844a452371
  947. iterations_since_restore: 3
  948. loss: 1.4272501501079649
  949. node_ip: 172.17.0.2
  950. pid: 1565
  951. should_checkpoint: true
  952. time_since_restore: 326.1525847911835
  953. time_this_iter_s: 98.34804058074951
  954. time_total_s: 326.1525847911835
  955. timestamp: 1609878516
  956. timesteps_since_restore: 0
  957. training_iteration: 3
  958. trial_id: d3304_00006
  959. == Status ==
  960. Memory usage on this node: 4.2/240.1 GiB
  961. Using AsyncHyperBand: num_stopped=9
  962. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.5963747934103012 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  963. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  964. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  965. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  966. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  967. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  968. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  969. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.42725 | 0.5095 | 3 |
  970. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  971. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  972. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  973. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  974. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  975. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  976. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  977. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  978. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  979. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  980. [2m[36m(pid=1565)[0m [4, 2000] loss: 1.320
  981. [2m[36m(pid=1565)[0m [4, 4000] loss: 0.701
  982. [2m[36m(pid=1565)[0m [4, 6000] loss: 0.454
  983. [2m[36m(pid=1565)[0m [4, 8000] loss: 0.345
  984. [2m[36m(pid=1565)[0m [4, 10000] loss: 0.276
  985. [2m[36m(pid=1565)[0m [4, 12000] loss: 0.234
  986. [2m[36m(pid=1565)[0m [4, 14000] loss: 0.199
  987. [2m[36m(pid=1565)[0m [4, 16000] loss: 0.170
  988. [2m[36m(pid=1565)[0m [4, 18000] loss: 0.151
  989. [2m[36m(pid=1565)[0m [4, 20000] loss: 0.144
  990. Result for DEFAULT_d3304_00006:
  991. accuracy: 0.4749
  992. date: 2021-01-05_20-30-15
  993. done: false
  994. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  995. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  996. hostname: 1a844a452371
  997. iterations_since_restore: 4
  998. loss: 1.4950430885698218
  999. node_ip: 172.17.0.2
  1000. pid: 1565
  1001. should_checkpoint: true
  1002. time_since_restore: 425.3827154636383
  1003. time_this_iter_s: 99.23013067245483
  1004. time_total_s: 425.3827154636383
  1005. timestamp: 1609878615
  1006. timesteps_since_restore: 0
  1007. training_iteration: 4
  1008. trial_id: d3304_00006
  1009. == Status ==
  1010. Memory usage on this node: 4.1/240.1 GiB
  1011. Using AsyncHyperBand: num_stopped=9
  1012. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1013. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1014. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1015. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  1016. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1017. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1018. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  1019. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.49504 | 0.4749 | 4 |
  1020. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1021. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1022. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1023. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1024. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1025. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1026. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1027. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1028. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1029. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1030. [2m[36m(pid=1565)[0m [5, 2000] loss: 1.314
  1031. [2m[36m(pid=1565)[0m [5, 4000] loss: 0.663
  1032. [2m[36m(pid=1565)[0m [5, 6000] loss: 0.453
  1033. [2m[36m(pid=1565)[0m [5, 8000] loss: 0.341
  1034. [2m[36m(pid=1565)[0m [5, 10000] loss: 0.278
  1035. [2m[36m(pid=1565)[0m [5, 12000] loss: 0.235
  1036. [2m[36m(pid=1565)[0m [5, 14000] loss: 0.197
  1037. [2m[36m(pid=1565)[0m [5, 16000] loss: 0.173
  1038. [2m[36m(pid=1565)[0m [5, 18000] loss: 0.155
  1039. [2m[36m(pid=1565)[0m [5, 20000] loss: 0.137
  1040. Result for DEFAULT_d3304_00006:
  1041. accuracy: 0.531
  1042. date: 2021-01-05_20-31-56
  1043. done: false
  1044. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  1045. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  1046. hostname: 1a844a452371
  1047. iterations_since_restore: 5
  1048. loss: 1.373500657767952
  1049. node_ip: 172.17.0.2
  1050. pid: 1565
  1051. should_checkpoint: true
  1052. time_since_restore: 526.6667892932892
  1053. time_this_iter_s: 101.28407382965088
  1054. time_total_s: 526.6667892932892
  1055. timestamp: 1609878716
  1056. timesteps_since_restore: 0
  1057. training_iteration: 5
  1058. trial_id: d3304_00006
  1059. == Status ==
  1060. Memory usage on this node: 4.1/240.1 GiB
  1061. Using AsyncHyperBand: num_stopped=9
  1062. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1063. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1064. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1065. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  1066. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1067. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1068. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  1069. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.3735 | 0.531 | 5 |
  1070. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1071. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1072. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1073. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1074. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1075. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1076. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1077. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1078. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1079. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1080. [2m[36m(pid=1565)[0m [6, 2000] loss: 1.325
  1081. [2m[36m(pid=1565)[0m [6, 4000] loss: 0.668
  1082. [2m[36m(pid=1565)[0m [6, 6000] loss: 0.457
  1083. [2m[36m(pid=1565)[0m [6, 8000] loss: 0.338
  1084. [2m[36m(pid=1565)[0m [6, 10000] loss: 0.283
  1085. [2m[36m(pid=1565)[0m [6, 12000] loss: 0.232
  1086. [2m[36m(pid=1565)[0m [6, 14000] loss: 0.198
  1087. [2m[36m(pid=1565)[0m [6, 16000] loss: 0.175
  1088. [2m[36m(pid=1565)[0m [6, 18000] loss: 0.149
  1089. [2m[36m(pid=1565)[0m [6, 20000] loss: 0.140
  1090. Result for DEFAULT_d3304_00006:
  1091. accuracy: 0.4852
  1092. date: 2021-01-05_20-33-55
  1093. done: false
  1094. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  1095. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  1096. hostname: 1a844a452371
  1097. iterations_since_restore: 6
  1098. loss: 1.5015573524537555
  1099. node_ip: 172.17.0.2
  1100. pid: 1565
  1101. should_checkpoint: true
  1102. time_since_restore: 645.3050956726074
  1103. time_this_iter_s: 118.63830637931824
  1104. time_total_s: 645.3050956726074
  1105. timestamp: 1609878835
  1106. timesteps_since_restore: 0
  1107. training_iteration: 6
  1108. trial_id: d3304_00006
  1109. == Status ==
  1110. Memory usage on this node: 4.1/240.1 GiB
  1111. Using AsyncHyperBand: num_stopped=9
  1112. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1113. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1114. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1115. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  1116. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1117. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1118. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  1119. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.50156 | 0.4852 | 6 |
  1120. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1121. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1122. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1123. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1124. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1125. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1126. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1127. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1128. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1129. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1130. [2m[36m(pid=1565)[0m [7, 2000] loss: 1.295
  1131. [2m[36m(pid=1565)[0m [7, 4000] loss: 0.662
  1132. [2m[36m(pid=1565)[0m [7, 6000] loss: 0.452
  1133. [2m[36m(pid=1565)[0m [7, 8000] loss: 0.339
  1134. [2m[36m(pid=1565)[0m [7, 10000] loss: 0.270
  1135. [2m[36m(pid=1565)[0m [7, 12000] loss: 0.235
  1136. [2m[36m(pid=1565)[0m [7, 14000] loss: 0.193
  1137. [2m[36m(pid=1565)[0m [7, 16000] loss: 0.169
  1138. [2m[36m(pid=1565)[0m [7, 18000] loss: 0.154
  1139. [2m[36m(pid=1565)[0m [7, 20000] loss: 0.137
  1140. Result for DEFAULT_d3304_00006:
  1141. accuracy: 0.4696
  1142. date: 2021-01-05_20-35-52
  1143. done: false
  1144. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  1145. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  1146. hostname: 1a844a452371
  1147. iterations_since_restore: 7
  1148. loss: 1.5851255111492393
  1149. node_ip: 172.17.0.2
  1150. pid: 1565
  1151. should_checkpoint: true
  1152. time_since_restore: 762.1866834163666
  1153. time_this_iter_s: 116.88158774375916
  1154. time_total_s: 762.1866834163666
  1155. timestamp: 1609878952
  1156. timesteps_since_restore: 0
  1157. training_iteration: 7
  1158. trial_id: d3304_00006
  1159. == Status ==
  1160. Memory usage on this node: 4.1/240.1 GiB
  1161. Using AsyncHyperBand: num_stopped=9
  1162. Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1163. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1164. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1165. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  1166. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1167. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1168. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  1169. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.58513 | 0.4696 | 7 |
  1170. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1171. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1172. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1173. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1174. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1175. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1176. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1177. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1178. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1179. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1180. [2m[36m(pid=1565)[0m [8, 2000] loss: 1.341
  1181. [2m[36m(pid=1565)[0m [8, 4000] loss: 0.667
  1182. [2m[36m(pid=1565)[0m [8, 6000] loss: 0.445
  1183. [2m[36m(pid=1565)[0m [8, 8000] loss: 0.336
  1184. [2m[36m(pid=1565)[0m [8, 10000] loss: 0.271
  1185. [2m[36m(pid=1565)[0m [8, 12000] loss: 0.228
  1186. [2m[36m(pid=1565)[0m [8, 14000] loss: 0.196
  1187. [2m[36m(pid=1565)[0m [8, 16000] loss: 0.175
  1188. [2m[36m(pid=1565)[0m [8, 18000] loss: 0.155
  1189. [2m[36m(pid=1565)[0m [8, 20000] loss: 0.135
  1190. Result for DEFAULT_d3304_00006:
  1191. accuracy: 0.467
  1192. date: 2021-01-05_20-37-32
  1193. done: true
  1194. experiment_id: d8bae0fc87134e6398fd0341279c1a1a
  1195. experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724
  1196. hostname: 1a844a452371
  1197. iterations_since_restore: 8
  1198. loss: 1.6539037554110967
  1199. node_ip: 172.17.0.2
  1200. pid: 1565
  1201. should_checkpoint: true
  1202. time_since_restore: 862.3724186420441
  1203. time_this_iter_s: 100.18573522567749
  1204. time_total_s: 862.3724186420441
  1205. timestamp: 1609879052
  1206. timesteps_since_restore: 0
  1207. training_iteration: 8
  1208. trial_id: d3304_00006
  1209. == Status ==
  1210. Memory usage on this node: 4.1/240.1 GiB
  1211. Using AsyncHyperBand: num_stopped=10
  1212. Bracket: Iter 8.000: -1.4607074356479388 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1213. Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1214. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1215. Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
  1216. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1217. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1218. |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
  1219. | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.6539 | 0.467 | 8 |
  1220. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1221. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1222. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1223. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1224. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1225. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1226. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1227. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1228. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1229. +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
  1230. == Status ==
  1231. Memory usage on this node: 4.0/240.1 GiB
  1232. Using AsyncHyperBand: num_stopped=10
  1233. Bracket: Iter 8.000: -1.4607074356479388 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267
  1234. Resources requested: 0/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects
  1235. Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08
  1236. Number of trials: 10/10 (10 TERMINATED)
  1237. +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+
  1238. | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
  1239. |---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------|
  1240. | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 |
  1241. | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 |
  1242. | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 |
  1243. | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 |
  1244. | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 |
  1245. | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 |
  1246. | DEFAULT_d3304_00006 | TERMINATED | | 2 | 64 | 256 | 0.00177236 | 1.6539 | 0.467 | 8 |
  1247. | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 |
  1248. | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 |
  1249. | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 |
  1250. +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+
  1251. Best trial config: {'l1': 32, 'l2': 4, 'lr': 0.0012023396319256663, 'batch_size': 16}
  1252. Best trial final validation loss: 1.2222298237800597
  1253. Best trial final validation accuracy: 0.5619
  1254. Files already downloaded and verified
  1255. Files already downloaded and verified
  1256. Best trial test set accuracy: 0.5537

如果运行代码,则示例输出如下所示:

为了避免浪费资源,大多数审判​​已提早停止。 效果最好的试验的验证准确率约为 58%,可以在测试仪上进行确认。

就是这样了! 您现在可以调整 PyTorch 模型的参数。

脚本的总运行时间:(14 分钟 43.400 秒)

下载 Python 源码:hyperparameter_tuning_tutorial.py

下载 Jupyter 笔记本:hyperparameter_tuning_tutorial.ipynb

由 Sphinx 画廊生成的画廊