什么是 Vits

浏览 401 扫码分享 2025-01-08 08:52:12

演示地址
使用方式
- 在线使用
- 本地使用
论文精读">论文精读
训练

:::info

VITS（Variational Inference with adversarial learning for end-to-end Text-to-Speech）是一种结合变分推理（variational inference）、标准化流（normalizing flows）和对抗训练的高表现力语音合成模型。VITS通过隐变量而非频谱串联起来语音合成中的声学模型和声码器，在隐变量上进行随机建模并利用随机时长预测器，提高了合成语音的多样性，输入同样的文本，能够合成不同声调和韵律的语音。

:::

论文地址：

http://proceedings.mlr.press/v139/kim21f/kim21f.pdf

演示地址

Github 地址: https://github.com/jaywalnut310/vits

Audio Samples from “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech”

https://github.com/SayaSS/vits-finetuning

使用方式

在线使用

本地使用

下载 MoeGoe：

https://github.com/CjangCjengh/MoeGoe/releases

下载 MoeGoeUI:

https://github.com/CjangCjengh/MoeGoe_GUI/releases/tag/v3.0.2

下载模型：

https://huggingface.co/spaces/zomehwh/vits-models

https://huggingface.co/spaces/zomehwh/vits-uma-genshin-honkai

下载对应配置文件

https://huggingface.co/spaces/zomehwh/vits-models/blob/main/config/config.json

什么是 Vits - 图1

示例生成文件：

论文精读

细读经典：VITS，用于语音合成带有对抗学习的条件变分自编码器

训练

【So-VITS/VITS】炼丹速通指南

快速训练：

GitHub - Plachtaa/VITS-fast-fine-tuning: This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion

若有收获，就点个赞吧

上一篇:

下一篇:

让时间为你证明

展开/收起文章目录