抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

Diffusion

扩散原理

生成模型的目标是:给定一组数据,构建一个分布,生成新的数据

一种思想是,从一个简单的分布开始(如果高斯分布),将其转化到目标分布

扩散模型正是这样的框架,将一个复杂抽样,转化为一系列简单抽样。其核心就是学习反转很多中间步骤会更简单

高斯扩散

Gaussian Diffusion

对于一个满足目标分布(尽管这个分布当下还是未知的)的随机变量$x_0$,我们为他添加一系列独立的高斯噪声。这个过程被称为前向扩散(forward process)
$$
x_{t+1}=x_t + \eta_t ,\quad \eta_t \sim N(0, \sigma^2)
$$
经过观察可知,在极高的步数下,$x_t$的边缘分布$p_t$极其接近高斯分布,我们将其近似为高斯分布,可以直接采样

我们将任务分解为一个个:给定边缘分布 $p_t$,生成分布 $p_{t-1}$。这被称为反向采样器(reverse sampler),如果我们有了反向采样器,我们就能从一个高斯噪声不断扩散出原始分布 $p_0$

DDPM

一个常用的构建反向采样器的方法是DDPM:在步数 $t$,输入一个满足 $p_t$的值 $z$,输出一个值,满足条件分布
$$
p(x_{t-1}|x_t=z)
$$
为每一个$x_t$都学习一个条件分布,这过于复杂了

我们假定,当每一步的噪声 $\sigma$非常小时,每一步的条件分布都满足高斯分布,即
$$
p(x_{t-1}|x_t=z) \approx N(x_{t-1};\mu, \sigma^2)
$$
将条件分布转化为高斯分布,我们相当于得到了分布的形状。只需要再获得分布的均值,就能得到整个分布

而我们可以使用神经网络和回归求这个均值

Flow Matching

SD3使用了Flow Matching替代DDPM

扩散模型是Flow Matching的一种特例

Flow Matching通过匹配模型向量场(Vector Field)和目标向量场来训练模型,训练后的向量场可以实现从简单分布转变为复杂目标分布

Flow(流):一系列时间索引的向量场

SDXL

结构

SD1.5和SDXL都是UNet base model

SDXL是一个二阶段级联扩散模型,包括Base模型和Refiner模型。prompt经过Base模型得到一个图像Latents,Refiner对Latents进行降噪和细节提升,最后再用VAE Decoder解码为图像

Base模型是一个画图模型,可以实现T2I、I2I、Inpaint

  • UNet,在Encoder-Decoder结构的基础上,添加了Time Embedding、Cross Attention、Self-Attention
  • VAE
    • Encoder,图像2Latents
    • Decoder,Latents2图像
  • 两个CLIP Text Encoder

画图

T2I

from diffusers import DiffusionPipeline

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

prompt = "a blue hair gril"
image = pipe(prompt, num_inference_steps=45, guidance_scale=7.5, height=1024, width=1024).images[0]

image.save("output.jpg")

T2I LoRA

LoRA可以改变模型画风

from diffusers import DiffusionPipeline

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights("sd-gbf-lora")

prompt = "a blue hair gril"
lora_scale = 0.9
image = pipe(prompt, num_inference_steps=45, guidance_scale=7.5, cross_attention_kwargs={"scale": lora_scale}, height=1024, width=1024).images[0]

image.save("output.jpg")

I2I LoRA

将图片转为LoRA画风

import torch
from PIL import Image
from diffusers import StableDiffusionXLImg2ImgPipeline

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights("sd-gbf-lora")

# 加载输入图像
input_image_path = "examples/lubi.jpg"
input_image = Image.open(input_image_path).convert("RGB")

prompt = "gbfhero"
negative_prompt = "low quality, bad quality"

with torch.no_grad():
output_image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=7.5,
cross_attention_kwargs={"scale": 0.9},
height=1024, width=1024,
image=input_image,
strength=0.5 # 控制生成图像与输入图像的相似程度,范围为0到1
).images[0]

output_image.save(f"outputs/1.jpg")

I2I LoRA Controlnet

直接使用I2I LoRA效果并不好,对原图的控制能力比较弱,可以配合使用Controlnet使用

import os
import cv2
import torch
import numpy as np
from PIL import Image
from diffusers import StableDiffusionXLControlNetImg2ImgPipeline, ControlNetModel

output_folder = "outputs"
os.makedirs(output_folder, exist_ok=True)

# 加载模型和Controlnet
pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "diffusers/controlnet-canny-sdxl-1.0"
controlnet = ControlNetModel.from_pretrained(controlnet_id, torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(pipe_id, controlnet=controlnet, torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights("sd-gbf-lora3")

# 加载图片
input_image_path = "examples/leishen.jpeg"
input_image = Image.open(input_image_path).convert("RGB")
np_image = np.array(input_image)
# 生成 edges
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)
canny_image.save(f'{output_folder}/tmp_edge.png')

prompt = "gbfhero, clean background"
negative_prompt = "low quality, bad quality"
lora_scale = 0.9
image = pipe(prompt,
negative_prompt=negative_prompt,
guidance_scale=7.5,
cross_attention_kwargs={"scale": lora_scale},
controlnet_conditioning_scale=0.5,
image=input_image,
strength=0.9,
control_image=canny_image,
height=1024, width=1024).images[0]

image.save(f"{output_folder}/5.jpg")

SDXL LoRA训练

参考train_text_to_image_lora_sdxl.py

需要几百到几千张图片训练上千步,才能得到一个较好的LoRA

Tensorboard

Tensorboard是一个用于监控训练过程的UI

pip install tensorboard

找到训练的log文件夹,找到一个形如events.out.tfevents.xxxx.xxx.xxx.x的文件,运行

tensorboard --logdir=log/xxxx --port=7861

会启动一个服务,访问这个链接就可以查看当前训练信息

from torch.utils.tensorboard import SummaryWriter

train_writer = SummaryWriter(log_dir=save_tensorboard_path)
train_writer.add_scalar('valid/mse_loss', np.mean(valid_loss), train_step)
train_writer.add_scalar('train/mse_loss', np.mean(loss_running[-args.log_interval*5:]), train_step)
train_writer.add_scalar('profile/io_time', profile_times['io'], train_step)

accelerate

huggingface推出的多机多卡训练框架,类似于tensorrun

train.sh

# 单机器多卡(4张GPU)
export CUDA_VISIBLE_DEVICES='0,1,2,3'
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_DEBUG=INFO
export NUM_PROCESSES=${MLP_WORKER_NUM}
export NPROC_PER_NODE=${MLP_WORKER_GPU}

accelerate launch \
--config_file deepspeed.config \
--multi_gpu \
train.py

deepspeed.config

distributed_type: DEEPSPEED
fsdp_config: {}
num_processes: 4
num_machines: 1
mixed_precision: 'fp16'
use_cpu: false
machine_rank: 0
main_training_function: main

DreamBooth LoRA训练

更推荐用这个,DreamBooth使用了Rare-token Identifiers,会将instance prompt映射到更稀有的区域,比如在英语中插入一些字符,“A dog”变成“A[V] dog“,这样的prompt在tokenizer的位置会更稀有,不容易受到原本prompt训练的影响,更容易学到东西

参考train_dreambooth_lora_sdxl.py,使用3~5张图和一个相同的prompt,就能获得一个较好的效果

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch \
--mixed_precision="fp16" \
train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
  • INSTANCE_DIR:存放图片的文件夹路径,文件夹内只放图片即可
  • instance_prompt:对这些图片的主体描述

修复

train_dreambooth_lora_sdxl.py在fp16训练时有几处问题,比如修改log_validation函数中to.(device)的代码,当fp16时,不设置类型

if args.mixed_precision == 'fp16':
pipeline = pipeline.to(accelerator.device)
else:
pipeline = pipeline.to(accelerator.device, dtype=torch_dtype)

还有Dataset中加载instance data文件夹时,加载了所有类型的文件,应该只加载图片类型

filenames = sorted(os.listdir(instance_data_root))
filenames = list(filter(lambda file: file.endswith(('.jpeg', '.png', 'jpg')), filenames))
filenames = [os.path.join(instance_data_root, name) for name in filenames]
instance_images = [Image.open(path) for path in filenames]

删除

check_min_version("0.33.0.dev0")

Flux

Flux是目前最好的画图模型,不过生态还不太完备

Flux、SD3是DiT结构

画图

import os
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("trained-flux-lora-gbf")

prompt = "gbfhero, a blue hair gril with a sword, gorgeous background, swimwear"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512
).images[0]
os.makedirs("outputs", exist_ok=True)
image.save("outputs/1.png")

ControlNet LoRA

from diffusers import FluxControlNetImg2ImgPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny", torch_dtype=torch.bfloat16)
pipe = FluxControlNetImg2ImgPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet, torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("trained-flux-lora-gbf")
...
prompt = "gbf hero"
image = pipe(
prompt,
guidance_scale=3.5,
image=input_image,
strength=0.99,
control_image=canny_image,
control_guidance_start=0.2,
control_guidance_end=0.8,
controlnet_conditioning_scale=1.0,
height=1024, width=1024
).images[0]
image.save(f"{output_folder}/c1.png")

DreamBooth LoRA训练

Flux模型非常巨大,训练很容易超显存、内存

参考train_dreambooth_lora_flux.py

export MODEL_NAME="black-forest-labs/FLUX.1-dev"
export INSTANCE_DIR="gbf"
export OUTPUT_DIR="trained-flux-lora"

accelerate launch \
--mixed_precision="bf16" \
train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--mixed_precision="bf16" \
--instance_prompt="gbf hero" \
--resolution=1024 \
--train_batch_size=1 \
--guidance_scale=1 \
--gradient_accumulation_steps=4 \
--optimizer="prodigy" \
--learning_rate=1. \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of gbf hero" \
--validation_epochs=25 \
--seed="0" \

训练代码也有问题,使用bf16进行训练时,需要替换log_validation这行代码

# autocast_ctx = nullcontext()
autocast_ctx = torch.autocast(accelerator.device.type)

评论