- What is a “model”">What is a “model”
- Basic training parameters">Basic training parameters
- LoRA type LoRA型">LoRA type LoRA型
- LoRA network weights LoRA网络权重">LoRA network weights LoRA网络权重
- DIM from weights 权重变暗">DIM from weights 权重变暗
- Train batch size 训练批量大小">Train batch size 训练批量大小
- Epoch 时代">Epoch 时代
- Save every N epochs">Save every N epochs
- Caption Extension 字幕扩展">Caption Extension 字幕扩展
- Mixed precision 混合精度">Mixed precision 混合精度
- Save precision 保存精度">Save precision 保存精度
- Number of CPU threads per core">Number of CPU threads per core
- Seeds 种子">Seeds 种子
- Cache latents 缓存潜伏">Cache latents 缓存潜伏
- Cache latents to disk">Cache latents to disk
- Learning rate: 学习率:">Learning rate: 学习率:
- LR Scheduler: LR调度器:">LR Scheduler: LR调度器:
- LR warmup LR热身">LR warmup LR热身
- Optimizer 优化器">Optimizer 优化器
- Optimizer extra arguments">Optimizer extra arguments
- Text Encoder learning rate">Text Encoder learning rate
- Unet learning rate Unet学习率">Unet learning rate Unet学习率
- Network Rank (Dimension)">Network Rank (Dimension)
- Network alpha 网络阿尔法">Network alpha 网络阿尔法
- Max resolution 最大分辨率">Max resolution 最大分辨率
- Stop text encoder training">Stop text encoder training
- Enable buckets 启用存储桶">Enable buckets 启用存储桶
- Advanced Configuration 高级配置">Advanced Configuration 高级配置
- Weights, Blocks, Conv 权重、块、转换">Weights, Blocks, Conv 权重、块、转换
- Weights: Down LR weights/Mid LR weights/Up LR weights">Weights: Down LR weights/Mid LR weights/Up LR weights
- Weights: Blocks LR zero threshold">Weights: Blocks LR zero threshold
- Blocks: Block dims, Block alphas">Blocks: Block dims, Block alphas
- Conv: Conv dims, Conv, alphas">Conv: Conv dims, Conv, alphas
- No token padding 无令牌填充">No token padding 无令牌填充
- Gradient accumulation steps">Gradient accumulation steps
- Weighted captions 加权字幕">Weighted captions 加权字幕
- Prior loss weight 之前减肥">Prior loss weight 之前减肥
- LR number of cycles">LR number of cycles
- LR power 左右功率">LR power 左右功率
- Additional parameters 附加参数">Additional parameters 附加参数
- Save every N steps">Save every N steps
- Save last N steps">Save last N steps
- Keep n tokens 保留n个令牌">Keep n tokens 保留n个令牌
- Clip skip 剪辑跳过">Clip skip 剪辑跳过
- Max Token Length 最大令牌长度">Max Token Length 最大令牌长度
- Full fp16 training (experimental)">Full fp16 training (experimental)
- Gradient checkpointing 梯度检查点">Gradient checkpointing 梯度检查点
- Shuffle caption 随机播放标题">Shuffle caption 随机播放标题
- Persistent data loaders 持久数据加载器">Persistent data loaders 持久数据加载器
- Memory efficient attention">Memory efficient attention
- Use xformers 使用 xformers">Use xformers 使用 xformers
- Color augmentation 色彩增强">Color augmentation 色彩增强
- Flip augmentation 翻转增强">Flip augmentation 翻转增强
- Min SNR gamma 最小信噪比伽马">Min SNR gamma 最小信噪比伽马
- Don’t upscale bucket resolution">Don’t upscale bucket resolution
- Bucket resolution steps 桶解析步骤">Bucket resolution steps 桶解析步骤
- Random crop instead of center crop">Random crop instead of center crop
- Noise offset type 噪音抵消型">Noise offset type 噪音抵消型
- Dropout caption every n epochs">Dropout caption every n epochs
- Rate of caption dropout">Rate of caption dropout
- VAE batch size VAE批量大小">VAE batch size VAE批量大小
- Save training state 保存训练状态">Save training state 保存训练状态
- Resume from saved training state">Resume from saved training state
- Max train epoch 最大训练纪元">Max train epoch 最大训练纪元
- Max num workers for DataLoader">Max num workers for DataLoader
- WANDB API Key WANDB API 密钥">WANDB API Key WANDB API 密钥
- WANDB Logging WANDB 日志记录">WANDB Logging WANDB 日志记录
- Weights, Blocks, Conv 权重、块、转换">Weights, Blocks, Conv 权重、块、转换
- Sample images config 示例图像配置">Sample images config 示例图像配置
https://github.com/bmaltais/kohya_ss/blob/master/docs/LoRA/options.md
To understand the meaning of each setting in kohya_ss, it is necessary to know how LoRA performs additional learning.要理解kohya_ss中每个设置的含义,有必要了解LoRA如何进行附加学习。 We will also explain what the “model,” which is the target of additional learning, is.
我们还将解释额外学习的目标“模型”是什么。
What is a “model”
什么是“模型”
Stable Diffusion loads and uses modules called “ models “. A model is, so to speak, a “brain”, and its true identity is “ weight information of a neural network “.Stable Diffusion 加载并使用称为“模型”的模块。模型可以说是一个“大脑”,它的真实身份是“神经网络的权重信息”。 A neural network is made up of many “ neurons “, and the clusters of neurons form many layers of “ layers “. Neurons in one layer are connected to neurons in another layer by lines, and the strength of the connection is “ weight “. It is this “weight” that holds a huge amount of picture information.
神经网络是由许多“神经元”组成的,神经元簇又形成了许多层“层”。一层中的神经元通过线与另一层中的神经元连接,连接的强度是“权重”。正是这个“重量”,蕴藏着海量的图片信息。
LoRA adds a small neural net
LoRA 添加了一个小型神经网络
LoRA is a kind of “additional learning”, but additional learning is to upgrade the neural network.LoRA是一种“额外学习”,但额外学习是对神经网络进行升级。 An additional learning method called “DreamBooth” uses this method.
另一种称为“DreamBooth”的学习方法就使用这种方法。 With this method, if you want to publish the additional training data, you need to distribute the whole model that has been updated with additional training.
使用这种方法,如果要发布额外的训练数据,则需要分发已通过额外训练更新的整个模型。 Models are typically 2G to 5G bytes in size, making them difficult to distribute.
模型的大小通常为 2G 到 5G 字节,这使得它们难以分发。 In contrast, LoRA learning leaves the model alone and creates a new “small neural net ” for each position you want to learn. Additional training is done on this small neural net .
相比之下,LoRA 学习无需考虑模型,而是为您想要学习的每个位置创建一个新的“小型神经网络”。在这个小型神经网络上进行额外的训练。 When you want to distribute LoRA, you only need to distribute this small neural network , so the data size is small.
当你想分布式LoRA时,只需要分布式这个小型神经网络,因此数据量很小。
Structure of a small neural net
小型神经网络的结构
LoRA’s small neural net consists of three layers. The number of neurons in the “input layer” on the left and the “output layer” on the right is the same as the number of neurons in the “input layer” and “output layer” of the target neural network . The number of neurons in the middle layer (middle layer) is called the “rank number” (or the number of dimensions), and this number can be freely determined when learning.LoRA 的小型神经网络由三层组成。左侧“输入层”和右侧“输出层”的神经元数量与目标神经网络的“输入层”和“输出层”的神经元数量相同。中间层(中间层)的神经元数量称为“秩数”(或维数),这个数量在学习时可以自由确定。
LoRA Learning Target 1: U-Net
LoRA学习目标1:U-Net
U-Net is divided into “Down” (left half), “Mid” (bottom) and “Up” (right half).U-Net分为“下”(左半部分)、“中”(下)和“上”(右半部分)。 And it consists of 25 blocks in total: Down12 block, Mid1 block, and Up12 block. The neural net added here is simply called “UNet” in Kohya_ss.
它总共由 25 个块组成:Down12 块、Mid1 块和 Up12 块。这里添加的神经网络在 Kohya_ss 中简称为“UNet”。
RoLA Learning Object 2: Text Encoder
RoLA学习对象2:文本编码器
This isn’t the only time LoRA adds neural nets .这并不是 LoRA 唯一一次添加神经网络。 The block called “Cross Attention” in the figure above receives text information from a module called “Text Encoder “. This “text encoder “ has the role of converting the prompt, which is text data, into a string of numbers (vector).
上图中名为“Cross Attention”的块从名为“Text Encoder”的模块接收文本信息。这个“文本编码器”的作用是将作为文本数据的提示转换为一串数字(向量)。 There is only one text encoder , which is shared by all Attention Blocks in U-Net. This text encoder is originally treated as a “finished product” within Stable Diffusion and is not subject to model learning, but it is also subject to additional learning by LoRA.
只有一个文本编码器,由 U-Net 中的所有注意力块共享。该文本编码器最初被视为稳定扩散中的“成品”,不受模型学习的影响,但也受到 LoRA 的额外学习。 The LoRA updated text encoder is used in all Attention blocks, so any neural nets added here will have a huge impact on the final image.
LoRA 更新的文本编码器用于所有 Attention 块中,因此此处添加的任何神经网络都会对最终图像产生巨大影响。 The neural network added here is called “Text Encoder” in Kohya_ss.
这里添加的神经网络在 Kohya_ss 中称为“文本编码器”。
Basic training parameters
基本训练参数
LoRA type LoRA型
Specifies the type of LoRA learning. The LoRA explained above is the “standard” type. “DyLoRA” learns multiple ranks below the specified rank at the same time, so it is convenient when you want to select the optimum rank. LoHa is highly efficient LoRA, and LoCon extends learning to U-Net’s Res block.指定LoRA学习的类型。上面解释的LoRA是“标准”类型。 “DyLoRA”同时学习低于指定等级的多个等级,因此当您想要选择最佳等级时很方便。 LoHa是高效的LoRA,LoCon将学习扩展到U-Net的Res块。 There is no problem with the Standard type at first. If you are having trouble learning, try another type.
Standard类型一开始是没有问题的。如果您在学习时遇到困难,请尝试其他类型。
LoRA network weights LoRA网络权重
If you want to use the already learned LoRA file for additional learning, specify the LoRA file here.如果您想使用已经学习的 LoRA 文件进行额外学习,请在此处指定 LoRA 文件。 The LoRA specified here will be read at the start of learning, and learning will start from this LoRA state. LoRA after learning is saved as another file, so the LoRA file specified here will not be overwritten.
这里指定的LoRA会在学习开始时被读取,学习会从这个LoRA状态开始。 LoRA学习后保存为另一个文件,所以这里指定的LoRA文件不会被覆盖。
DIM from weights 权重变暗
This is an option only when doing additional training with LoRA network weights.仅当使用 LoRA 网络权重进行额外训练时,才可以选择此选项。 As shown in the figure above, LoRA adds a small neural network , but the number of neurons (number of ranks) in the middle layer can be freely set with Network Rank (described later).
如上图所示,LoRA增加了一个小型神经网络,但中间层的神经元数量(Rank数)可以用Network Rank(稍后介绍)自由设置。 However, turning this option on will set the number of ranks of the created LoRA to the same number of ranks as the LoRA specified in LoRA network weights. When this is turned on, the specification of Network Rank is ignored.
但是,打开此选项会将创建的 LoRA 的等级数设置为与 LoRA 网络权重中指定的 LoRA 的等级数相同。打开此功能后,将忽略网络排名的规范。 For example, when the number of LoRA ranks used for additional learning is 32, the number of LoRA ranks to be created will also be set to 32.
例如,当用于附加学习的LoRArank数为32时,要创建的LoRArank数也将设置为32。 Default is off. 默认关闭。
Train batch size 训练批量大小
Specify a batch size. A batch is “the number of images to read at once”. A batch size of 2 will train two images at a time simultaneously. If multiple different pictures are learned at the same time, the tuning accuracy for each picture will drop, but since it will be learning that comprehensively captures the characteristics of multiple pictures, the final result may instead be better.指定批量大小。批次是“一次读取的图像数量”。批量大小为 2 将同时训练两个图像。如果同时学习多张不同的图片,每张图片的调优精度会下降,但由于是综合捕捉多张图片特征的学习,最终的结果可能反而会更好。 (If you tune too much to a specific picture, it will become LoRA that is not applicable.)
(如果对特定图片调太多,就会变成LoRA不适用。) Since multiple pictures are learned at once, the higher the batch size, the shorter the learning time. However, the tuning accuracy decreases and the number of weight changes decreases, so there is a possibility that the learning may be insufficient in some cases.
由于一次学习多张图片,batch size越大,学习时间越短。然而,调整精度会降低,权重变化次数也会减少,因此在某些情况下学习可能不充分。 (There is also a report that when increasing the batch size, it is better to increase the learning rate (described later). For example, if the batch size is 2, the learning rate should be doubled.)
(还有一份报告称,当增加batch size时,最好提高学习率(稍后介绍)。例如,如果batch size为2,则学习率应加倍。) Also, the higher the batch size, the more memory is consumed. Let’s decide according to the size of VRAM of your PC.
此外,批处理大小越大,消耗的内存就越多。让我们根据您的 PC 的 VRAM 大小来决定。 With 6GB of VRAM, a batch size of 2 would be barely possible.
对于 6GB 的 VRAM,批量大小为 2 几乎是不可能的。 Default is 1. 默认值为 1。 Since all the images read at the same time for each batch must be the same size, if the sizes of the training images are different, the number of images that are processed simultaneously may be less than the number of batches specified here.
由于每个批次同时读取的所有图像必须具有相同的尺寸,因此如果训练图像的尺寸不同,则同时处理的图像数量可能会小于此处指定的批次数量。
Epoch 时代
One epoch is “one set of learning”.一个时代就是“一套学”。 For example, let’s say you want to learn by reading 50 images each 10 times. In this case, 1 epoch is 50x10 = 500 trainings. If it is 2 epochs, this will be repeated twice, so it will be 500x2 = 1000 times of learning.
例如,假设您想通过每次阅读 50 张图像 10 次来学习。在这种情况下,1 epoch 是 50x10 = 500 次训练。如果是 2 epoch,这将重复两次,因此将是 500x2 = 1000 次学习。 After training for the specified number of epochs, a LoRA file will be created and saved to the specified location.
训练指定次数的 epoch 后,将创建 LoRA 文件并将其保存到指定位置。 For LoRA, 2-3 epochs of learning is sufficient.
对于 LoRA,2-3 个 epoch 的学习就足够了。
Save every N epochs
每 N 个 epoch 保存一次
You can save the progress as a LoRA file for each epoch number specified here.您可以将此处指定的每个纪元号的进度保存为 LoRA 文件。 For example, if you specify 10 in “Epoch” and specify 2 in “Save every N epochs”, the LoRA file will be saved in the specified folder every 2 epochs (at the end of 2, 4, 6, 8 epochs).
例如,如果在“Epoch”中指定10,在“Save every N epochs”中指定2,则LoRA文件将每2个epoch(在2、4、6、8个epoch结束时)保存到指定文件夹中。 If you don’t need to create an intermediate LoRA, set the value here to the same value as “Epoch”.
如果不需要创建中间LoRA,请将此处的值设置为与“Epoch”相同的值。
Caption Extension 字幕扩展
If you have prepared a caption file for each image, specify the extension of the caption file here.如果您为每个图像准备了标题文件,请在此处指定标题文件的扩展名。 If this is blank, the extension will be “.caption”. If the extension of the caption file is “.txt”, specify “.txt” here.
如果为空,则扩展名为“.caption”。如果字幕文件的扩展名是“.txt”,请在此处指定“.txt”。 If you don’t have a caption file, you can ignore it.
如果您没有字幕文件,可以忽略它。
Mixed precision 混合精度
Specifies the type of mixed precision for the weight data during training.指定训练期间权重数据的混合精度类型。 The weight data is originally in 32-bit units (when no is selected), but if necessary, learning by mixing 16-bit unit data will lead to considerable memory savings and speedup. fp16 is a data format with half the precision , and bf16 is a data format devised to handle the same numerical width as 32-bit data .
权重数据最初以 32 位为单位(选择 no 时),但如果有必要,通过混合 16 位单位数据进行学习将带来相当大的内存节省和加速。 fp16 是一种精度减半的数据格式,bf16 是一种设计用于处理与 32 位数据相同的数字宽度的数据格式。 You can get LoRA with a sufficiently high accuracy at fp16.
您可以获得 fp16 精度足够高的 LoRA。
Save precision 保存精度
Specifies the type of weight data to save in the LoRA file.指定要保存在 LoRA 文件中的重量数据的类型。 float is 32-bit, fp16 and bf16 are 16-bit units. The two below have smaller file sizes.
float是32位单位,fp16和bf16是16位单位。下面两个文件大小较小。 The default is fp16.
默认值为 fp16。
Number of CPU threads per core
每个核心的 CPU 线程数
The number of threads per CPU core during training. Basically, the higher the number, the higher the efficiency, but it is necessary to adjust the settings according to the specifications.训练期间每个 CPU 核心的线程数。基本上数字越高,效率越高,但需要根据规格调整设置。 Default is 2. 默认值为 2。
Seeds 种子
During learning, there are a number of random processes such as in what order to read the images’’ and how much noise to put on the training images (details omitted)’’.在学习过程中,存在一些随机过程,例如 in what order to read the images’’ and 在训练图像上放置多少噪声(详细信息略)’’。 Seed is like an ID for determining the random processing procedure, and if the same Seed is specified, the same random procedure will be used each time, making it easier to reproduce the learning results.
Seed就像一个确定随机处理过程的ID,如果指定相同的Seed,则每次都会使用相同的随机过程,从而更容易重现学习结果。 However, there are random processes that do not use this seed (such as randomly cropping images), so specifying the same seed does not always give the same learning results.
但是,有些随机过程不使用此种子(例如随机裁剪图像),因此指定相同的种子并不总是给出相同的学习结果。 Default is blank. If not specified, Seed will be set appropriately when training is executed.
默认为空。如果未指定,则在执行训练时将适当设置种子。 If you want to reproduce the result as much as possible, there is no loss by setting a number (such as 1234) appropriately.
如果想尽可能重现结果,适当设置一个数字(如1234)也不会有损失。
Cache latents 缓存潜伏
The training image is read into VRAM, “compressed” to a state called Latent before entering U-Net, and is trained in VRAM in this state. Normally, images are “compressed” each time they are loaded, but you can specify that “compressed” images are kept in main memory by checking Cache latents.训练图像被读入 VRAM,在进入 U-Net 之前“压缩”到称为 Latent 的状态,并在此状态下在 VRAM 中进行训练。通常,每次加载图像时都会“压缩”图像,但您可以通过检查潜在缓存来指定将“压缩”图像保留在主内存中。 Keeping it in the main memory saves VRAM space and speeds up, but you can’t process the image before “compression”, so you can’t use augmentation (described later) other than flip_aug. Also, random crop (described later), which crops the image in a random range each time, cannot be used.
将其保留在主内存中可以节省 VRAM 空间并加快速度,但是您无法在“压缩”之前处理图像,因此您不能使用 Flip_aug 之外的增强(稍后介绍)。此外,无法使用每次在随机范围内裁剪图像的随机裁剪(稍后描述)。 Default is on. 默认开启。
Cache latents to disk
将潜在缓存缓存到磁盘
Similar to the Cache latents option, but checking this allows you to specify that compressed image data be saved to disk as temporary files.与缓存潜在选项类似,但选中此选项允许您指定将压缩图像数据作为临时文件保存到磁盘。 This temporary file can be reused even after restarting kohya_ss, so if you want to do LoRA learning with the same data many times, turning on this option will increase learning efficiency.
这个临时文件即使在重启kohya_ss后也可以重复使用,所以如果你想用相同的数据多次进行LoRA学习,打开这个选项会提高学习效率。 However, if you turn this on, you will not be able to use augmentation and random crop other than flip_aug.
然而,如果你打开这个功能,你将无法使用除了flip_aug之外的增强和随机裁剪。 Default is off. 默认关闭。
Learning rate: 学习率:
Specify the learning rate. “ Learning” is to change the thickness (weight) of the wiring in the neural network so that a picture that looks exactly like the given picture can be made, but every time a picture is given, the wiring is changed. If you tune too much only to the given picture, you will not be able to draw other pictures at all.指定学习率。 “学习”就是改变神经网络中连线的粗细(权重),从而可以制作出与给定图片一模一样的图片,但每给出一张图片,连线都会改变。如果你只对给定的图片调整太多,你将根本无法绘制其他图片。 To avoid this, we change the weights slightly each time to incorporate a little bit more of the given picture. The “learning rate” determines the amount of this “just a little”.
为了避免这种情况,我们每次都会稍微改变权重,以包含更多的给定图片。 “学习率”决定了这个“一点点”的量。 The default value is 0.0001.
默认值为 0.0001。
LR Scheduler: LR调度器:
You can change the learning rate in the middle of learning. A scheduler is a setting for how to change the learning rate.您可以在学习过程中更改学习率。调度程序是如何更改学习率的设置。 adafactor: Select this to set the optimizer (described later) to Adafactor . Learn while automatically adjusting the learning rate according to the situation to save VRAM constant: the learning rate does not change from beginning to end constant_with_warmup: Start with a learning rate of 0 and gradually increase it toward the set value of Learning rate during warm-up, and use the set value of Learning rate during main learning. cosine : Gradually decrease the learning rate toward 0 while drawing a wave (cosine curve) cosine _with_restarts: repeat cosine many times (see also description of LR number of cycles) linear: Start at the Learning rate setting and decrease linearly towards 0 polynomial: Same behavior as linear, but a bit more complicated to reduce (see also LR power description) Set to constant if you want the learning rate to be fixed at the Learning rate setting.
adafactor:选择此选项可将优化器(稍后描述)设置为 Adafactor 。学习的同时根据情况自动调整学习率保存VRAMconstant:学习率自始至终不变constant_with_warmup:从0开始学习率,在预热时逐渐增加到Learningrate的设定值,并在主学习时使用Learningrate的设定值。 cosine :在绘制波形(余弦曲线)时,逐渐将学习率降低到 0 cosine _with_restarts:多次重复余弦(另请参见 LR 周期数的描述) 线性:从学习率设置开始,向 0 线性降低 多项式:相同行为为线性,但降低起来有点复杂(另请参阅 LR 功率描述)如果您希望学习率固定在学习率设置,请设置为常量。 Default is cosine 默认为余弦
LR warmup LR热身
If you have selected constant_with_warmup in the scheduler, set here how many times to warm up.如果您在调度程序中选择了constant_with_warmup,请在此处设置预热次数。 The number specified here is a percentage of the total number of steps.
此处指定的数字是总步数的百分比。 For example, if you train 50 images 10 times with a batch size of 1 and do this for 2 epochs, the total number of steps is 50x10x2=1000. If you set LR warmup to 10, the first 10% of the 1000 total steps, or 100 steps, will be the warmup.
例如,如果您以批量大小 1 训练 50 个图像 10 次并执行 2 个 epoch,则总步数为 50x10x2=1000。如果将 LR 热身设置为 10,则 1000 步总数的前 10%(即 100 步)将作为热身。 You can ignore this if your scheduler is not constant_with_warmup.
如果您的调度程序不是constant_with_warmup,您可以忽略这一点。 Default is 10. 默认值为 10。
Optimizer 优化器
The optimizer is a setting for “how to update the neural net weights during training “. Various methods have been proposed for smart learning, but the most commonly used in LoRA learning is AdamW’’ (32-bit) or AdamW8bit’’. AdamW8bit uses less VRAM and has enough accuracy, so if you get lost, use this.优化器是“如何在训练期间更新神经网络权重”的设置。人们已经提出了多种用于智能学习的方法,但LoRA学习中最常用的是 AdamW’’ (32-bit) or AdamW8bit’’。 AdamW8bit 使用较少的 VRAM 并且具有足够的精度,因此如果您迷路了,请使用它。 In addition, “Adafactor”, which adjusts the learning rate appropriately according to the progress of learning while incorporating Adam’s method, is also often used (Learning rate setting is ignored when using Adafactor).
另外,结合Adam方法的同时,根据学习进度适当调整学习率的“Adafactor”也经常被使用(使用Adafactor时会忽略学习率设置)。 DAdapt’’ is an optimizer that adjusts the learning rate , and Lion’’ is a relatively new optimizer , but it has not been fully verified yet. There is a report that “SGDNesterov” has good learning accuracy but slows down.
DAdapt’’ is an optimizer that adjusts the learning rate , and Lion’’是一个相对较新的优化器,但尚未得到充分验证。有报告称“SGDNesterov”具有良好的学习准确性,但速度较慢。 The default is AdamW8bit. There is no problem basically as it is.
默认为 AdamW8bit。基本上就这样是没有问题的。
Optimizer extra arguments
优化器额外参数
If you want more granularity for a given optimizer , write the command here.如果您希望给定优化器具有更多粒度,请在此处编写命令。 You can usually leave this field blank.
您通常可以将此字段留空。
Text Encoder learning rate
文本编码器学习率
Sets the learning rate for the text encoder . As I wrote earlier, the effect of additional training on text encoders affects the entire U-Net.设置文本编码器的学习率。正如我之前所写,对文本编码器进行额外训练的效果会影响整个 U-Net。 Therefore, it is usually set lower than the learning rate (Unet learning rate) for each block of U-Net.
因此,通常将其设置为低于U-Net每个块的学习率(Unet学习率)。 The default value is 0.00005(5e-5).
默认值为 0.00005(5e-5)。 If you specify a number here, it takes precedence over the Learning rate value.
如果您在此处指定一个数字,则该数字优先于学习率值。
Unet learning rate Unet学习率
Sets the learning rate for U-Net. This is the learning rate when performing additional learning on each attention block (and other blocks depending on the setting) in U-Net.设置 U-Net 的学习率。这是对 U-Net 中的每个注意力块(以及其他块,取决于设置)执行额外学习时的学习率。 The default value is 0.0001.
默认值为 0.0001。 If you specify a number here, it takes precedence over the Learning rate value.
如果您在此处指定一个数字,则该数字优先于学习率值。
Network Rank (Dimension)
网络排名(维度)
Specifies the number of neurons in the hidden layer of the “additional small neural net “ described earlier in the article (see the figure above for details).指定本文前面介绍的“附加小神经网络”隐藏层的神经元数量(具体参见上图)。 The larger the number of neurons , the more learning information can be stored, but the possibility of learning unnecessary information other than the learning target increases, and the LoRA file size also increases.
神经元数量越多,可以存储的学习信息越多,但学习到学习目标以外的不必要信息的可能性就增加,LoRA文件大小也随之增加。 Generally, it is often set to a maximum of about 128, but there are reports that 32 is sufficient.
一般情况下,它的最大值通常设置为 128 左右,但有报告称 32 就足够了。 When making LoRA on a trial basis, it may be better to start from around 2 to 8.
在试用 LoRA 时,从 2 到 8 左右开始可能会更好。 Default is 8. 默认值为 8。
Network alpha 网络阿尔法
This was introduced as a convenience measure to prevent weights from being rounded to 0 when saving LoRA.这是一种便利措施,旨在防止保存 LoRA 时权重四舍五入为 0。 Due to the structure of LoRA, the weight value of the neural network tends to be small, and if it becomes too small, it may become indistinguishable from zero (that is, the same as not learning anything). Therefore, a technique was proposed in which the actual (stored) weight value is kept large, but the weight is always weakened at a constant rate during learning to make the weight value appear smaller. Network alpha determines this “weight weakening rate”.
由于LoRA的结构,神经网络的权重值往往很小,如果变得太小,可能会变得与零无法区分(即与没有学习到任何东西一样)。因此,提出了一种技术,其中实际(存储的)权重值保持较大,但在学习过程中总是以恒定的速率削弱权重,以使权重值显得更小。网络阿尔法决定了这个“权重削弱率”。 The smaller the Network alpha value, the larger the stored LoRA neural net weights.
Network alpha值越小,存储的LoRA神经网络权重就越大。 How much the weight weakens when used (usage strength) is calculated by “Network_Alpha/Network_Rank” (roughly a value between 0 and 1) and is closely related to the Network Rank number.
使用时权重减弱的程度(使用强度)是通过“Network_Alpha/Network_Rank”(大致为0到1之间的值)计算的,与Network Rank数字密切相关。 If the accuracy of LoRA after learning is not good enough, the weight data may be too small and collapsed to 0. In such a case, try lowering the Network Alpha value (=increasing the save weight value).
如果LoRA学习后的准确率不够好,可能是权重数据太小而塌陷为0。这种情况下,可以尝试降低Network Alpha值(=增大保存权重值)。 The default is 1 (that is, maximize the stored weight value).
默认为1(即最大化存储的权重值)。 If Network Alpha and Network Rank have the same value, the effect will be turned off.
如果 Network Alpha 和 Network Rank 的值相同,则该效果将被关闭。 Network Alpha value must not exceed Network Rank value. It is possible to specify a higher number, but there is a high probability that it will result in an unintended LoRA.
网络 Alpha 值不得超过网络排名值。可以指定更高的数字,但很可能会导致意外的 LoRA。 Also, when setting the Network Alpha, you should consider the effect on the learning rate.
另外,在设置Network Alpha时,应该考虑对学习率的影响。 For example, with an Alpha of 16 and a Rank of 32, the strength of the weight used is 16/32 = 0.5, meaning that the learning rate is only half as powerful as the Learning Rate setting.
例如,Alpha 为 16,Rank 为 32,所使用的权重强度为 16/32 = 0.5,这意味着学习率仅为学习率设置的一半。 If Alpha and Rank are the same number, the strength used will be 1 and will have no effect on the learning rate.
如果 Alpha 和 Rank 是相同的数字,则使用的强度将为 1,并且对学习率没有影响。
Max resolution 最大分辨率
Specify the maximum resolution of training images in the order of “width, height”. If the training images exceed the resolution specified here, they will be scaled down to this resolution.按照“宽度、高度”的顺序指定训练图像的最大分辨率。如果训练图像超过此处指定的分辨率,它们将缩小到该分辨率。 The default is “512,512”. Many models use images of this size, so it is safe to use images of this size when learning LoRA.
默认值为“512,512”。许多模型都使用这种大小的图像,因此在学习 LoRA 时使用这种大小的图像是安全的。
Stop text encoder training
停止文本编码器训练
You can stop learning the text encoder in the middle. As I wrote above, updating the text encoder has a big impact on the whole, so it is easy to fall into overfitting (tuning too much to the training image and other images can not be drawn), and it is also overfitting to stop learning at a moderate point is one way to prevent你可以中途停止学习文本编码器。正如我上面写的,更新文本编码器对整体影响很大,因此很容易陷入过拟合(对训练图像调整太多导致其他图像无法绘制),并且停止学习也是过拟合在适度的点上是预防的一种方法 The number specified here is a percentage of the total training step. Once learning reaches this percentage, the text encoder stops learning.
此处指定的数字是总训练步骤的百分比。一旦学习达到这个百分比,文本编码器就会停止学习。 For example, if the total number of steps is 1000 and you specify 80 here, the text encoder will finish training when the learning progress is 80%, i.e. 1000x0.8=800 steps.
例如,如果总步数为 1000,此处指定 80,则文本编码器将在学习进度为 80% 时完成训练,即 1000x0.8=800 步。 Training of U-Net continues with 200 remaining steps.
U-Net 的训练继续进行,剩余 200 个步骤。 If this is 0, the text encoder training will not stop until the end.
如果为 0,文本编码器训练将不会停止,直到结束。
Enable buckets 启用存储桶
“ bucket “ is a “bucket” (container) as the name suggests. The training images used in LoRA do not have to be of the same size, but images of different sizes cannot be trained at the same time. Therefore, it is necessary to sort the images into “buckets” according to their size before training. Put similar sized images in the same bucket and different sized images in different buckets.“bucket”顾名思义就是“桶”(容器)。 LoRA中使用的训练图像不必具有相同的尺寸,但不同尺寸的图像不能同时进行训练。因此,在训练之前需要将图像按照大小分类到“桶”中。将相似大小的图像放在同一个存储桶中,将不同大小的图像放在不同的存储桶中。 Default is on. 默认开启。 If your training images are all the same size, you can turn this option off, but leaving it on has no effect.
如果您的训练图像大小相同,您可以关闭此选项,但保持打开状态没有任何效果。 If you turn off Enable buckets when the size of the training images is not unified, the training images will be enlarged or reduced to have the same size.
如果在训练图像尺寸不统一时关闭启用存储桶,训练图像将被放大或缩小到相同尺寸。 Enlargement and reduction are performed while maintaining the aspect ratio of the image. If the aspect ratio is not the same as the standard size, the vertical or horizontal size of the image after scaling may exceed the standard size. For example, if the base size is 512x512 ( 1 aspect ratio ) and the image size is 1536x1024 ( 1.5 aspect ratio ), the image will be scaled down to 768x512 ( 1.5 aspect ratio remains).
在保持图像的纵横比的同时执行放大和缩小。如果宽高比与标准尺寸不同,缩放后图像的垂直或水平尺寸可能会超出标准尺寸。例如,如果基本尺寸为 512x512(1 宽高比),图像尺寸为 1536x1024(1.5 宽高比),则图像将缩小至 768x512(仍保留 1.5 宽高比)。
Advanced Configuration 高级配置
After this are the options in the Advanced Configuration section.之后是高级配置部分中的选项。
Weights, Blocks, Conv 权重、块、转换
These are the “learning weight” and “rank” settings for each block in U-Net. Selecting each tab will bring up the corresponding configuration screen.这些是 U-Net 中每个块的“学习权重”和“排名”设置。选择每个选项卡将显示相应的配置屏幕。 These settings are for advanced users. If you have no preference, you can leave all fields blank.
这些设置适用于高级用户。如果您没有偏好,可以将所有字段留空。
Weights: Down LR weights/Mid LR weights/Up LR weights
权重:下 LR 权重/中 LR 权重/上 LR 权重
As you can see from the U-Net structure diagram, U-Net consists of 12 IN blocks, 1 MID block, and 12 OUT blocks, a total of 25 blocks.从U-Net结构图中可以看到,U-Net由12个IN块、1个MID块、12个OUT块组成,总共25个块。 If you want different learning rate weights for each block, you can set them here individually.
如果你想要每个块有不同的学习率权重,你可以在这里单独设置它们。 The weight here is the “strength of learning” represented by a numerical value of 0 to 1. If it is 0, it is “not learning at all”, and if it is 1, it is “learning at the learning rate set in Learning rate”. can vary the intensity of learning.
这里的权重是用0到1的数值表示的“学习的强度”。如果为0,则为“根本不学习”,如果为1,则为“以中设置的学习率学习”学习率”。可以改变学习的强度。 A weight of 0.5 means half the learning rate.
权重为 0.5 意味着学习率减半。 “Down LR weights” specify the weights for each of the 12 IN blocks.
“Down LR 权重”指定 12 个 IN 块中每个块的权重。 “Mid LR weights” specifies the weights of the MID block.
“Mid LR 权重”指定MID 块的权重。 “Up LR weights” specify the weight of each of the 12 OUT blocks.
“Up LR 权重”指定 12 个 OUT 块中每一个的权重。
Weights: Blocks LR zero threshold
权重:块 LR 零阈值
I explained that “LoRA adds neural nets “, but it doesn’t make sense to add neural nets with too small weights (i.e. barely learned). Therefore, you can set “Do not add neural nets to blocks with too small weights “.我解释说“LoRA 添加了神经网络”,但添加权重太小的神经网络(即几乎没有学到)是没有意义的。因此,您可以设置“不要向权重太小的块添加神经网络”。 Blocks that do not exceed the weight value set here will not be added to the neural net . For example, if you specify 0.1 here, the neural net will not be added to blocks with weights less than or equal to 0.1 (note that exclusions also include the specified value!).
不超过此处设置的权重值的块将不会被添加到神经网络中。例如,如果您在此处指定 0.1,则神经网络将不会添加到权重小于或等于 0.1 的块(请注意,排除也包括指定的值!)。 The default is blank, which is 0 (do nothing).
默认值为空白,即 0(不执行任何操作)。
Blocks: Block dims, Block alphas
块:块暗淡、块阿尔法
Here you can set different rank (dim) and alpha values for each of the 25 blocks IN0这里可以为IN011、MID、OUT011这25个块中的每一个设置不同的rank(dim)和alpha值。 See Network Rank, Network alpha for rank and alpha values.
请参阅网络排名、网络 alpha 了解排名和 alpha 值。 Blocks with higher rank are expected to hold more information.
等级较高的区块预计会保存更多信息。 You must always specify 25 numbers for this parameter value, but since LoRA targets attention blocks, IN0, IN3, IN6, IN9, IN10, IN11, IN11, OUT0, and IN1 do not have attention blocks. , IN2 settings (1st, 4th, 7th, 11th, 12th, 14th, 15th, 16th digits) are ignored during learning.
您必须始终为此参数值指定 25 个数字,但由于 LoRA 的目标是注意块,因此 IN0、IN3、IN6、IN9、IN10、IN11、IN11、OUT0 和 IN1 没有注意块。 , IN2 设置(第 1 位、第 4 位、第 7 位、第 11 位、第 12 位、第 14 位、第 15 位、第 16 位)在学习期间被忽略。 This is a setting for advanced users. If you don’t care, you can leave it blank. If not specified here, “Network Rank(Dimension)” value and “Network Alpha” value will be applied to all blocks.
这是针对高级用户的设置。如果您不在乎,可以将其留空。如果此处未指定,“网络排名(维度)”值和“网络 Alpha”值将应用于所有块。
Conv: Conv dims, Conv, alphas
Conv:Conv 变暗、Conv、alpha
The attention block that LoRA learns from has a neural network called “Conv “, which is also updated by additional learning (see the diagram of the attention layer structure at the top of the article). This is a process called “convolution”, and the size of the “filter” used there is 1x1 square.LoRA 学习的注意力块有一个名为“Conv”的神经网络,它也通过额外的学习进行更新(参见文章顶部的注意力层结构图)。这是一个称为“卷积”的过程,其中使用的“滤波器”的尺寸是1x1的正方形。 Read this article about convolutions .
阅读这篇关于卷积的文章。 On the other hand, some of the blocks other than Attention (Res, Down blocks) and some of the Attention blocks in OUT are convoluted using a 3x3 square filter. Originally, that is not the learning target of LoRA, but by specifying it with this parameter, the 3x3 convolution of the Res block can also be the learning target.
另一方面,除了 Attention 之外的一些块(Res、Down 块)和 OUT 中的一些 Attention 块使用 3x3 方形滤波器进行卷积。本来这不是LoRA的学习目标,但是通过指定这个参数,Res块的3x3卷积也可以作为学习目标。 Since there are more learning targets, there is a possibility that more precise LoRA learning can be performed.
由于学习目标更多,有可能进行更精确的LoRA学习。 The setting method is the same as “Blocks: Blocks dims, Blocks alphas”.
设置方法与“Blocks: Blocks dims, Blocks alphas”相同。 A 3x3 conv exists on all 25 layers.
所有 25 层均存在 3x3 卷积。 This is a setting for advanced users. If you don’t care, you can leave it blank.
这是针对高级用户的设置。如果您不在乎,可以将其留空。
No token padding 无令牌填充
Captions attached to training images are processed every 75 tokens tokens “ can basically be regarded as “words”).每 75 个令牌处理附加到训练图像的标题“基本上可以视为“单词”)。 If the caption length is less than 75 tokens align to 75 tokens This is called “padding”.
如果标题长度小于 75 个标记,则对齐到 75 个标记,这称为“填充”。 Here you can specify not to pad tokens
这里可以指定不填充token Default is off. You can basically leave it off.
默认关闭。基本上你可以放弃它。
Gradient accumulation steps
梯度累积步骤
Changing the weights (that is, “learning”) is usually done for each batch read, but it is also possible to do multiple batches of training at once. This option specifies how many batches to learn at once.更改权重(即“学习”)通常是针对每个批次读取完成的,但也可以一次进行多个批次的训练。该选项指定一次学习多少个批次。 This has a similar effect (not the “same effect”!) as increasing the number of batches.
这与增加批次数量具有类似的效果(不是“相同的效果”!)。 For example, if the batch size is 4, the number of images read simultaneously in one batch is 4. In other words, one learning is performed every four readings. If we set the Gradient accumulation steps to 2, training will be performed once every 2 batches, resulting in 1 learning per 8 reads. This works similarly (but not the same!) as batch number 8.
例如,如果批量大小为4,则一批中同时读取的图像数量为4。换句话说,每读取四次就进行一次学习。如果我们将梯度累积步数设置为 2,则每 2 个批次进行一次训练,从而每 8 个读取进行 1 次学习。这与批次号 8 的工作原理类似(但不相同!)。 If you increase this value, the number of times of learning will decrease, so the processing will be faster, but it will consume more memory.
如果增加这个值,学习的次数会减少,因此处理会更快,但会消耗更多的内存。 Default is 1. 默认值为 1。
Weighted captions 加权字幕
Currently, the most popular Stable Diffusion usage environment is “Stable Diffusion WebUI”, which has a unique prompt description method. For example, if you want to emphasize “Black” very strongly when specifying “ black cat “ at the prompt, put the word you want to emphasize in parentheses like “(black:1.2) cat” and put “: number” after the word , Words are emphasized by multiples of that number.目前最流行的Stable Diffusion使用环境是“Stable Diffusion WebUI”,它有独特的提示描述方式。例如,如果您想在提示符处指定“black cat”时非常强烈地强调“Black”,请将要强调的单词放在括号中,如“(black:1.2) cat”,并在该单词后面加上“: number” , 单词以该数字的倍数强调。 This option allows this notation to be used in the training image captions as well.
此选项也允许在训练图像标题中使用此符号。 If you want to write complex captions, it’s a good idea to give it a try.
如果您想编写复杂的标题,尝试一下是个好主意。 Default is off. 默认关闭。
Prior loss weight 之前减肥
The prior loss weight determines how much importance is given to the “ regularization images” (see the description of the Regularization folder above for details) during training .先验损失权重决定了训练时对“正则化图像”(详细信息请参见上面正则化文件夹的描述)的重视程度。 If this value is low, the regularization images are considered less important, and LoRA is generated that is more characteristic of the training images.
如果该值较低,则认为正则化图像不太重要,并且生成的 LoRA 更具训练图像的特征。 This setting has no meaning if you are not using a regularized image.
如果您不使用正则化图像,则此设置没有任何意义。 This is a value between 0 and 1, and defaults to 1 ( also respects regularized images).
这是一个介于 0 和 1 之间的值,默认为 1(也考虑正则化图像)。
LR number of cycles
LR循环数
If you select “ Cosine with restart” or “Polynomial” for the scheduler, this option specifies how many cycles the scheduler runs during training.如果您为调度程序选择“Cosine with restart”或“Polynomial”,则此选项指定调度程序在训练期间运行的周期数。 If the number of this option is 2 or greater, the scheduler will run multiple times during a single training run.
如果此选项的数量为 2 或更大,则调度程序将在单次训练运行期间运行多次。 In both Cosine with restart and Polynomial, the learning rate gradually decreases to 0 as learning progresses, but if the number of cycles is 2 or more, the learning rate is reset and restarted when the learning rate reaches 0.
在Cosine with restart和Polynomial中,随着学习的进行,学习率逐渐减小到0,但是如果循环数为2或更多,则当学习率达到0时,学习率会被重置并重新开始。 The figure below (source) is an example of the change in learning rate for Cosine with restart (purple) and Polynomial (light green).
下图(源)是重启后余弦(紫色)和多项式(浅绿色)学习率变化的示例。 The purple example has the number of cycles set to 4. The light green example has a cycle number of 1.
紫色示例的循环数设置为 4。浅绿色示例的循环数为 1。 Since the specified number of cycles is executed within the determined learning step, the more the number of cycles increases, the more the learning rate changes.
由于在确定的学习步骤内执行指定的循环次数,因此循环次数增加得越多,学习率变化就越大。 Default is blank, leaving blank equals 1.
默认为空白,留空等于 1。 Example of learning rate movement Cosine with restart “LR number of cycle = 4” (purple) Polynomial “LR power = 2” (light green)
学习率移动示例 重新启动的余弦“LR 周期数 = 4”(紫色) 多项式“LR 幂 = 2”(浅绿色)
LR power 左右功率
This is an option when the scheduler is set to Polynomial. The higher this number, the steeper the initial learning rate drops. (The slope of the light green line in the image above becomes steeper).当调度程序设置为多项式时,这是一个选项。这个数字越高,初始学习率下降得越陡。 (上图中浅绿色线的斜率变得更陡)。 When power is 1, it has the same shape as the linear scheduler.
当power为1时,它与线性调度器具有相同的形状。 If the number is too large, the learning rate will stick close to 0, resulting in insufficient learning, so be careful.
如果数字太大,学习率会接近0,导致学习不足,所以要小心。 Defaults to blank, leaving blank equals 1 (that is, the same as the linear scheduler).
默认为空白,留空等于1(即与线性调度器相同)。
Additional parameters 附加参数
If you want to tweak learning setting parameters that are not displayed in the kohya_ss GUI , enter them here as commands.如果您想要调整 kohya_ss GUI 中未显示的学习设置参数,请在此处输入它们作为命令。 You can usually leave this field blank.
您通常可以将此字段留空。
Save every N steps
每N步保存一次
A LoRA file is created and saved each time the number of steps specified here is completed.每次完成此处指定的步骤数时,都会创建并保存 LoRA 文件。 For example, when the total number of learning steps is 1000, if you specify 200 here, LoRA files will be saved at the end of 200, 400, 600, and 800 steps.
例如,当总学习步数为1000时,如果此处指定200,则LoRA文件将在200、400、600和800步结束时保存。 See also “Save every N epochs” for saving intermediate LoRA.
另请参阅“保存每 N 纪元”以保存中间 LoRA。 Default is 0 (do not save intermediate LoRA).
默认值为 0(不保存中间 LoRA)。
Save last N steps
保存最后 N 步骤
This is an option when Save every N steps is specified to save LoRA during learning.这是当指定 Save every N steps 来在学习期间保存 LoRA 时的一个选项。 If you want to keep only recent LoRA files and discard old LoRA files, you can set “how many recent steps of LoRA files to keep” here.
如果您只想保留最近的 LoRA 文件并丢弃旧的 LoRA 文件,可以在此处设置“保留 LoRA 文件的最近步数”。 For example, if the total number of training steps is 600 and the Save every N steps option is specified to save every 100 steps. Then LoRA files will be saved at the 100th, 200th, 300th, 400th, and 500th steps, but if Save every N steps is set to 300, only the last 300 steps of LoRA files will be saved. In other words, at the 500th step, LoRA older than the 200th (=500-300) step (that is, LoRA at the 100th step) is deleted.
例如,如果训练总步数为 600,并且指定“每 N 步保存”选项为每 100 步保存一次。那么 LoRA 文件将在第 100、200、300、400 和 500 步保存,但如果 Save every N steps 设置为 300,则仅保存 LoRA 文件的最后 300 步。换句话说,在第500步,删除比第200(=500-300)步更旧的LoRA(即,第100步的LoRA)。 Default is 0. 默认值为 0。
Keep n tokens 保留n个令牌
If your training images have captions, you can randomly shuffle the comma-separated words in the captions (see Shuffle caption option for details). However, if you have words that you want to keep at the beginning, you can use this option to specify “Keep the first 0 words at the beginning”.如果您的训练图像有标题,您可以随机打乱标题中逗号分隔的单词(有关详细信息,请参阅随机打乱标题选项)。但是,如果您想保留开头的单词,则可以使用此选项指定“将前 0 个单词保留在开头”。 The number of first words specified here will always be fixed at the beginning.
此处指定的第一个单词的数量将始终固定在开头。 Default is 0. This option does nothing if the shuffle caption option is off.
默认值为 0。如果随机播放字幕选项关闭,则此选项不会执行任何操作。
- A “word” here is a piece of text separated by commas. No matter how many words the delimited text contains, it counts as “one word”.
这里的“单词”是指用逗号分隔的一段文本。无论分隔文本包含多少个单词,它都算作“一个单词”。
在“黑猫,吃,坐”的情况下,“黑猫”是一个词。
Clip skip 剪辑跳过
The text encoder uses a mechanism called “CLIP”, which is made up of 12 similar layers.文本编码器使用一种称为“CLIP”的机制,它由 12 个相似的层组成。 Texts ( tokens ) are originally converted to numeric sequences (vectors) through these 12 layers, and the vectors coming out of the last layer are sent to the U-Net Attention block.
文本(标记)最初通过这 12 层转换为数字序列(向量),最后一层出来的向量被发送到 U-Net Attention 块。 However, the model developed independently by the service “Novel AI”, commonly known as “Novel AI model”, adopted a unique specification that uses the vector output by the second to last layer instead of the last layer. The same is true for models derived from Novel AI models. Therefore, it is necessary to specify “Which layer of CLIP is the vector from which the base model used for learning is used?”
然而,“Novel AI”服务独立开发的模型,俗称“Novel AI模型”,采用了独特的规范,使用倒数第二层而不是最后一层的向量输出。对于源自新颖人工智能模型的模型也是如此。因此,有必要指定“用于学习的基础模型是来自CLIP的哪一层向量?” “Clip skip” specifies the layer number of this “Xth from the end”.
“剪辑跳过”指定该“从末尾算起的第X个”的层号。 Setting this to 2 sends the penultimate layer’s output vector to the Attention block. If 1, the output vector of the last layer is used.
将其设置为 2 会将倒数第二层的输出向量发送到 Attention 块。如果为1,则使用最后一层的输出向量。 If the base model is a Novel AI model (or a mix of them), 2 should be fine. In other cases, 1 is fine.
如果基本模型是新颖的 AI 模型(或它们的混合),则 2 应该没问题。在其他情况下,1 就可以了。
Max Token Length 最大令牌长度
Specifies the length of the maximum token included in the caption .指定标题中包含的最大标记的长度。 The “tokens” here are not the number of words, but the number of tokens Note that commas also count as one token.
这里的“token”不是单词的数量,而是token的数量。注意,逗号也算作1个token。 It’s unlikely that you’ll use more than 75 tokens in your caption, but if you find your caption to be too long, specify a higher number here.
您在标题中使用的标记不太可能超过 75 个,但如果您发现标题太长,请在此处指定更高的数字。
Full fp16 training (experimental)
完整的 FP16 训练(实验)
When the option “Mixed precision” described above is turned on (fp16 or bf16), a mixture of 32-bit and 16-bit data is used during training, but when this option is turned on, all weight data is 16-bit (fp16 format). Although it saves memory, the accuracy of some data is halved, so there is a possibility that the learning accuracy will also drop.当打开上面描述的“混合精度”选项(fp16或bf16)时,训练时会使用32位和16位数据的混合,但是当打开此选项时,所有权重数据都是16位( fp16 格式)。虽然节省了内存,但某些数据的准确率减半,因此学习准确率也有可能下降。 Default is off. You should leave it off unless you really want to save memory.
默认关闭。除非您真的想节省内存,否则应该将其关闭。
Gradient checkpointing 梯度检查点
Normally, during training, we modify and update the weights of a large number of neural nets all at once each time an image is loaded. By fixing this “gradually” rather than “all at once,” you can save memory by reducing computation.通常,在训练过程中,每次加载图像时,我们都会一次性修改和更新大量神经网络的权重。通过“逐步”而不是“一次性”修复此问题,您可以通过减少计算来节省内存。 This option specifies that the weight calculation should be done incrementally. Turning this on or off will have no effect on LoRA’s learning results.
该选项指定权重计算应增量完成。打开或关闭此功能不会影响 LoRA 的学习结果。 Default is off. 默认关闭。
Shuffle caption 随机播放标题
If the training images have captions, most of the captions are written in the form of words separated by commas, such as “ black cat , eating, sitting”. The Shuffle caption option randomly changes the order of these comma-separated words each time.如果训练图像有标题,则大多数标题都是以逗号分隔的单词形式编写的,例如“黑猫,吃东西,坐着”。 “随机播放”标题选项每次都会随机更改这些逗号分隔单词的顺序。 Words in captions are generally given more weight the closer they are to the beginning. Therefore, if the word order is fixed, backward words may not be learned well, and forward words may have unintended associations with training images. It is hoped that this bias can be corrected by reordering the words each time the image is loaded.
标题中的单词通常越接近开头,其权重就越大。因此,如果词序是固定的,则后向单词可能无法很好地学习,并且前向单词可能与训练图像产生意想不到的关联。希望可以通过每次加载图像时重新排序单词来纠正这种偏差。 This option has no meaning if the caption is written in sentences instead of comma separated.
如果标题是用句子而不是逗号分隔的形式编写的,则此选项没有任何意义。 Default is off. 默认关闭。
- A “word” here is a piece of text separated by commas. No matter how many words the delimited text contains, it counts as “one word”.
这里的“单词”是指用逗号分隔的一段文本。无论分隔文本包含多少个单词,它都算作“一个单词”。
在“黑猫,吃,坐”的情况下,“黑猫”是一个词。
Persistent data loaders 持久数据加载器
The data required for training is discarded and reloaded after each epoch. This is an option to keep it instead of throwing it away. Turning this option on speeds up the start of training for new epochs, but uses more memory to hold the data.训练所需的数据将被丢弃,并在每个时期后重新加载。这是保留它而不是扔掉它的选择。启用此选项可加快新纪元训练的开始速度,但会使用更多内存来保存数据。 Default is off. 默认关闭。
Memory efficient attention
记忆高效注意力
If this is checked, VRAM usage is suppressed and attention block processing is performed. It’s slower than the next option “xformers”. Turn it on if you don’t have enough VRAM.如果选中此选项,则会抑制 VRAM 使用并执行注意块处理。它比下一个选项“xformers”慢。如果您没有足够的 VRAM,请打开它。 Default is off. 默认关闭。
Use xformers 使用 xformers
Using a Python library called “xformers” will trade attention blocking for less VRAM usage at the cost of some speed. Turn it on if you don’t have enough VRAM.使用名为“xformers”的 Python 库将以一定速度为代价,以注意力阻塞换取更少的 VRAM 使用。如果您没有足够的 VRAM,请打开它。 Default is on. 默认开启。
Color augmentation 色彩增强
“augmentation” means “padded image”. By slightly processing the training images each time, we artificially increase the number of types of training images.“增强”的意思是“填充图像”。通过每次对训练图像进行轻微处理,我们人为地增加了训练图像的类型数量。 When Color Augmentation is turned on, the Hue of the image is changed randomly each time. LoRA learned from this is expected to have a slight range in color tone.
当颜色增强打开时,图像的色调每次都会随机改变。 LoRA 由此得知,预计色调会有轻微的变化。 Not available if the Cache latents option is on.
如果启用“缓存潜在选项”,则不可用。 Default is off. 默认关闭。
Flip augmentation 翻转增强
If this option is turned on, the image will be horizontally flipped randomly. It can learn left and right angles, which is useful when you want to learn symmetrical people and objects .如果打开此选项,图像将随机水平翻转。它可以学习左角和右角,当你想学习对称的人和物体时,这很有用。 Default is off. 默认关闭。
Min SNR gamma 最小信噪比伽马
In LoRA learning, learning is performed by putting noise of various strengths on the training image (details about this are omitted), but depending on the difference in strength of the noise on which it is placed, learning will be stable by moving closer to or farther from the learning target. not, and the Min SNR gamma was introduced to compensate for that. Especially when learning images with little noise on them, it may deviate greatly from the target, so try to suppress this jump.在LoRA学习中,学习是通过在训练图像上放置各种强度的噪声来进行的(详细内容省略),但根据放置的噪声强度的差异,靠近或接近学习将是稳定的离学习目标越来越远。不是,并且引入了最小 SNR 伽马来补偿这一点。尤其是在学习噪声很小的图像时,可能会偏离目标很大,所以要尽量抑制这种跳跃。 I won’t go into details because it’s confusing, but you can set this value from 0 to 20, and the default is 0.
我不会详细介绍,因为这很令人困惑,但您可以将该值设置为 0 到 20,默认值为 0。 According to the paper that proposed this method, the optimal value is 5.
根据提出该方法的论文,最佳值为 5。 I don’t know how effective it is, but if you’re unsatisfied with the learning results, try different values.
我不知道它的效果如何,但是如果您对学习结果不满意,请尝试不同的值。
Don’t upscale bucket resolution
不要提高存储桶分辨率
The Bucket size defaults to 256-1024 pixels (or a maximum resolution if specified with the Max resolution option, which takes precedence). Images that fall outside this size range, either vertically or horizontally, will be scaled (preserving the aspect ratio ) to fit within the specified range.存储桶大小默认为 256-1024 像素(如果使用“最大分辨率”选项指定,则为最大分辨率,该选项优先)。超出此尺寸范围的图像(无论是垂直还是水平)都将被缩放(保留纵横比)以适应指定的范围。 However, when this option is turned on, the bucket size range setting is ignored and the buckets are automatically prepared according to the size of the training images, so all training images are loaded unscaled. . However, even at this time, some parts of the image may be cropped to fit the Bucket resolution steps (described later).
但是,当打开此选项时,将忽略存储桶大小范围设置,并根据训练图像的大小自动准备存储桶,因此所有训练图像均未缩放加载。 。然而,即使在这个时候,图像的某些部分可能会被裁剪以适应 Bucket 分辨率步骤(稍后描述)。 Default is on. 默认开启。
Bucket resolution steps 桶解析步骤
If using buckets , specify the resolution interval for each bucket here.如果使用存储桶,请在此处指定每个存储桶的解析间隔。 For example, if you specify 64 here, each training image will be sorted into separate buckets by 64 pixels according to their size. This sorting is done for each vertical and horizontal.
例如,如果您在此指定 64,则每个训练图像将根据其大小按 64 像素分类到单独的桶中。这种排序是针对每个垂直和水平进行的。 If the image size does not fit the specified size of the bucket, the protruding part will be cut off.
如果图像尺寸与桶的指定尺寸不符,则突出部分将被切除。 For example, if the maximum resolution is 512 pixels and the bucket step size is every 64 pixels , then the buckets will be 512, 448, 384… but a 500 pixel image will be put into a 448 pixel bucket, with an extra 52 pixels are clipped.
例如,如果最大分辨率为 512 像素,桶步长为每 64 像素,那么桶将为 512, 448, 384…,但是 500 像素的图像将被放入 448 像素的桶中,并带有额外的步长。 52 个像素被剪裁。 Default is 64 pixels .
默认值为 64 像素。
- If this number is too small, the buckets will be divided too finely, and in the worst case, it will be like “one bucket for each image”.
如果这个数字太小,桶就会被划分得太细,在最坏的情况下,就会像“每个图像一个桶”。
请注意,我们总是为每个批次从同一存储桶加载图像,因此存储桶中的图像太少会无意中减少批次数量。
Random crop instead of center crop
随机裁剪而不是中心裁剪
As mentioned above, half-sized images are sorted into buckets and then partly cropped to align the size, but usually it is cropped so as to keep the center of the image.如上所述,半尺寸的图像会被分类到桶中,然后进行部分裁剪以对齐尺寸,但通常会进行裁剪以保持图像的中心。 When this option is on, it randomly determines which part of the picture is cut. Turn on this option if you want to extend the learning range beyond the center of the image.
当此选项打开时,它会随机确定剪切图片的哪一部分。如果您想将学习范围扩展到图像中心之外,请打开此选项。 This option cannot be used when the cache latents option is on.
当缓存潜在选项打开时,无法使用此选项。
Noise offset type 噪音抵消型
This is an option to specify which method to use when adding additional noise to training images. At the time of learning, we always add noise to the image (details are omitted here), but it is preferable that this noise is “hard to predict” noise, so adding more noise makes it more “predictable”. “hard” noise.这是一个选项,用于指定在向训练图像添加额外噪声时使用哪种方法。在学习的时候,我们总是给图像添加噪声(这里省略细节),但最好这个噪声是“难以预测”的噪声,所以添加更多的噪声使其更“可预测”。 “硬”的噪音。 Default is Original. Multires adds noise in a slightly more complicated way.
默认为原始。多分辨率以稍微复杂的方式添加噪声。
Noise offset 噪声抵消
This is an option when “Original” is selected for Noise offset type. If you enter a value greater than 0 here, additional noise will be added. Values range from 0 to 1, where 0 adds no noise at all. A value of 1 adds strong noise.这是当噪声补偿类型选择“原始”时的选项。如果您在此处输入大于 0 的值,则会添加额外的噪声。值范围从0到1,其中0根本不添加噪音。值为 1 会增加强噪声。 It has been reported that adding about 0.1 noise makes LoRA’s colors more vivid (brighter and darker). Default is 0.
据报道,添加约 0.1 的噪声可以使 LoRA 的颜色更加鲜艳(更亮和更暗)。默认值为 0。 ####A daptive noise scale Used in combination with the Noise offset option. Specifying a number here will further adjust the amount of additional noise specified by Noise offset to be amplified or attenuated. The amount of amplification (or attenuation) is automatically adjusted depending on how noisy the image is currently. Values range from -1 to 1, with positive values increasing the amount of added noise and negative values decreasing the amount of added noise.
####自适应噪声标度与噪声偏移选项结合使用。此处指定一个数字将进一步调整由噪声偏移指定的要放大或衰减的附加噪声量。根据图像当前的噪声程度自动调整放大(或衰减)量。值范围从-1到1,正值增加添加的噪声量,负值减少添加的噪声量。 Default is 0. 默认值为 0。
Multires noise iterations
多分辨率噪声迭代
This is an option when “Multires” is selected for Noise offset type. If you enter a value greater than 0 here, additional noise will be added.当噪声偏移类型选择“Multires”时,这是一个选项。如果您在此处输入大于 0 的值,则会添加额外的噪声。 Multires creates noise of various resolutions and adds them together to create the final additive noise. Here you specify how many “various resolutions” to create.
多分辨率创建各种分辨率的噪声,并将它们加在一起以创建最终的加性噪声。您可以在此处指定要创建的“各种分辨率”的数量。 Default is 0, when 0 there is no additional noise. It is recommended to set it to 6 if you want to use it.
默认为0,当为0时没有额外的噪声。如果要使用建议设置为6。
Multires noise discount 多分辨率噪声折扣
Pair with the Multires noise iterations option. It is a numerical value for weakening the noise amount of each resolution to some extent. A value between 0 and 1, the lower the number, the weaker the noise. By the way, the amount of attenuation differs depending on the resolution, and noise with low resolution is attenuated a lot.与多分辨率噪声迭代选项配对。它是在一定程度上削弱各分辨率的噪声量的数值。 0到1之间的值,数字越小,噪声越弱。顺便说一句,衰减量根据分辨率而不同,低分辨率的噪声衰减得很多。 Default is 0, if 0 it will be set to 0.3 when used. 0.8 is usually recommended. If the number of training images is relatively small, it seems to be good to lower it to about 0.3.
默认为0,如果为0则使用时会设置为0.3。通常建议0.8。如果训练图像数量比较少,把它降低到0.3左右似乎比较好。
Dropout caption every n epochs
每 n 个 epoch 丢失一次标题
Normally, images and captions are trained in pairs, but it is possible to train only “images without captions” without using captions for each specific epoch.通常,图像和标题是成对训练的,但可以只训练“没有标题的图像”,而不使用每个特定时期的标题。 This option allows you to specify “Don’t use captions every 0 epochs ( Dropout )”.
此选项允许您指定“不要每 0 epochs 使用标题(Dropout)”。 For example, if you specify 2 here, image learning without captions will be performed every 2 epochs (2nd epoch, 4th epoch, 6th epoch…).
例如,如果您在此处指定 2,则将每 2 个 epoch(第 2 个 epoch、第 4 个 epoch、第 6 个 epoch…)执行一次不带字幕的图像学习。 When learning images without captions, LoRA is expected to learn more comprehensive image features. It can also be expected to have the effect of not associating too many image features with specific words. However, if you don’t use too many captions, the LoRA may become a LoRA without prompts, so be careful.
当学习没有字幕的图像时,LoRA 有望学习到更全面的图像特征。还可以预期它具有不将太多图像特征与特定单词相关联的效果。不过,如果你不使用太多的字幕,LoRA可能会变成没有提示的LoRA,所以要小心。 The default is 0, which means no caption dropout .
默认值为 0,表示没有字幕丢失。
Rate of caption dropout
字幕丢失率
It is similar to Dropout caption every n epochs above, but you can learn as “images without captions” without using captions for a certain percentage of the entire learning process.它类似于上面的每n个epochs的Dropout Caption,但是你可以将其学习为“没有字幕的图像”,而无需在整个学习过程的一定比例中使用字幕。 Here you can set the percentage of images without captions. 0 is the setting for “always use captions during learning”, and 1 is the setting for “never use captions during learning”.
您可以在此处设置没有标题的图像的百分比。 0 是“学习期间始终使用字幕”的设置,1 是“学习期间从不使用字幕”的设置。 It is random which images are learned as “images without captions”.
哪些图像被学习为“没有标题的图像”是随机的。 For example, if 20 images are read 50 times each and LoRA learning is performed for only 1 epoch, the total number of image learning is 20 images x 50 times x 1 epoch = 1000 times. At this time, if the Rate of caption dropout is set to 0.1, 1000 times x 0.1 = 100 times will be learned as “images without captions”.
例如,如果 20 张图像每张读取 50 次,LoRA 学习仅进行 1 epoch,则图像学习的总数为 20 张图像 x 50 次 x 1 epoch = 1000 次。此时,如果将字幕丢失率设置为 0.1,则 1000 次 x 0.1 = 100 次将被学习为“没有字幕的图像”。 Default is 0, which trains all images with captions.
默认值为 0,即训练所有带有标题的图像。
VAE batch size VAE批量大小
If you turn on the Cache latents option, you can keep the “compressed” image data in the main memory. size. Since the number of images specified by batch size is learned at once, it is normal to match the VAE batch size with this.如果打开 Cache Latents 选项,则可以将“压缩”的图像数据保留在主内存中。尺寸。由于批量大小指定的图像数量是一次性学习的,因此将 VAE 批量大小与此匹配是正常的。 Default is 0, in which case it is set to the same number as Batch size.
默认值为 0,在这种情况下,它设置为与批量大小相同的数字。
Save training state 保存训练状态
LoRA will take a long time to train if there are many training images, number of iterations, and number of epochs.如果训练图像多、迭代次数多、epoch 数多,LoRA 的训练时间就会很长。 If you turn on this option, you can interrupt the study in the middle and resume the study from where you left off at a later date.
如果您打开此选项,您可以在学习过程中中断并在稍后从中断处继续学习。 Intermediate learning data is saved in a folder called “last-state”.
中间学习数据保存在名为“last-state”的文件夹中。
Resume from saved training state
从保存的训练状态恢复
Specify the location of the “last-state” folder here if you want to resume learning that has been interrupted.如果您想恢复已中断的学习,请在此处指定“最后状态”文件夹的位置。 In order to resume learning, the intermediate progress data of learning must be saved.
为了恢复学习,必须保存学习的中间进度数据。
Max train epoch 最大训练纪元
Specify the maximum number of epochs for training. It is basic to specify the number of epochs with the Epoch option, but learning will always end when the number of epochs specified here is reached.指定训练的最大纪元数。使用 Epoch 选项指定 epoch 的数量是基本的,但是当达到此处指定的 epoch 的数量时,学习总是会结束。 Default is blank. You can leave this field blank.
默认为空。您可以将此字段留空。
Max num workers for DataLoader
DataLoader 的最大工作人员数
This option specifies the number of CPU processes to use when reading data for training. Increasing this number will enable subprocesses and increase the speed of reading data, but increasing the number too much may actually result in inefficiency.此选项指定读取训练数据时要使用的 CPU 进程数。增加这个数字将启用子进程并提高读取数据的速度,但增加太多实际上可能会导致效率低下。 Note that no matter how large the number is specified, it will not exceed the number of concurrently executing threads of the CPU used.
注意,无论指定多大的数量,都不会超过所使用的CPU的并发执行线程数。 The default is 0, which loads data only in the CPU’s main process.
默认为0,仅在CPU的主进程中加载数据。
WANDB API Key WANDB API 密钥
There is a machine learning service called “ WandB “ (Weights&Biases) . This is a service that displays the progress of learning in graphs to find the optimal settings, records and shares learning logs online, and kohya_ss can now use this service.有一种机器学习服务叫做“WandB”(Weights&Biases)。这是一项以图表形式显示学习进度以查找最佳设置、在线记录和分享学习日志的服务,kohya_ss 现在可以使用这项服务。 However, you will need an account for this service. After creating an account, you can get an “ API key” from https://app.wandb.ai/authorize . If you enter the acquired API key here, you will be automatically logged in when learning and you will be able to link with WandB services.
但是,您需要一个帐户才能使用此服务。创建帐户后,您可以从 https://app.wandb.ai/authorize 获取“API密钥”。如果您在此处输入获取的API密钥,您在学习时将自动登录,并且可以链接WandB服务。 I won’t go into details about WandB, but if you want to become a “LoRA craftsman”, give it a try.
关于WandB我就不多说了,但是如果你想成为“LoRA工匠”,不妨尝试一下。
WANDB Logging WANDB 日志记录
Here you can specify whether or not to record learning progress logs using the WandB service.此处您可以指定是否使用WandB服务记录学习进度日志。 The default is off, and when off, it logs in the form of a tool called ‘tensorboard’.
默认是关闭的,关闭时,它以名为“tensorboard”的工具的形式记录。
Sample images config 示例图像配置
If you want to check what image generation with LoRA looks like while learning, enter the image generation prompt here.如果您想在学习时检查 LoRA 的图像生成是什么样子,请在此处输入图像生成提示。 However, since LoRA has a relatively short learning time, there may not be much need for image generation tests.
然而,由于LoRA的学习时间相对较短,因此可能不需要太多图像生成测试。
Sample every n steps
每n步采样一次
Specify at what step you want to generate an image during learning. For example, specifying 100 will generate an image every 100 steps.指定您想要在学习过程中的哪个步骤生成图像。例如,指定 100 将每 100 步生成一张图像。 Default is 0, if 0 no image is generated.
默认为 0,如果为 0,则不生成图像。
Sample every n epochs
每 n 个 epoch 采样一次
Specifies the number of epochs to generate images during training. For example, 2 will generate an image every 2 epochs.指定训练期间生成图像的纪元数。例如,2将每2个epoch生成一张图像。 Default is 0, if 0 no image is generated.
默认为 0,如果为 0,则不生成图像。
Sample sampler 样品取样器
Specifies the sampler to use for image generation . Many of the samplers specified here are the same as the samplers provided in the Stable Diffusion Web UI , so please refer to the web UI explanation site for details.指定用于图像生成的采样器。这里指定的许多采样器与 Stable Diffusion Web UI 中提供的采样器相同,因此请参阅 Web UI 解释网站了解详细信息。 The default is euler_a.
默认为 euler_a。
Sample prompts 提示示例
Enter the prompt here.在此输入提示。 However, you can enter other settings here than just prompts. If you want to enter other settings, specify the setting by combining two minus letters and alphabets like “—n”. For example, if you want to put “white, dog” in the negative prompt, write “—n white, dog”.
但是,您可以在此处输入除提示之外的其他设置。如果要输入其他设置,请通过组合两个减号字母和字母(如“—n”)来指定设置。例如,如果您想在否定提示中输入“white,dog”,请写“—nwhite,dog”。 Here are some commonly used settings:
以下是一些常用的设置: —n: negative prompt —n: 否定提示 —w: image width —w:图像宽度 —h: image height —h:图像高度 —d: Seeds —d:种子 —l: CFG Scale —l: CFG 比例 —s: number of steps
—s:步数 Default is blank. When the field is blank, the description example is displayed in faint color, so please refer to it.
默认为空。当该字段为空时,描述示例会以淡色显示,因此请参考。