Refactor code structure for improved readability and maintainability

This commit is contained in:
2025-08-24 12:05:41 +02:00
parent 34f76242e6
commit c5ceea27b4
5 changed files with 4276 additions and 80 deletions

1
.python-version Normal file
View File

@@ -0,0 +1 @@
3.11

View File

@@ -1,35 +1,32 @@
<div align="center"> <div align="center">
MY FORK OF
(modified to use uv and local models)
# 💀🔊 StableAudioWebUI 💀🔊 # 💀🔊 StableAudioWebUI 💀🔊
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-red)](https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-red)](https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0)
### A Lightweight Gradio Web interface for running Stable Audio Open 1.0 ### A Lightweight Gradio Web interface for running Stable Audio Open 1.0
By *[@drbaph](https://instagram.com/drbaph)*
By _[@drbaph](https://instagram.com/drbaph)_
<br> <br>
<br> <br>
![image_2024-06-10_21-03-05](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/b3f4bd5a-04ec-4802-aabc-dcaea4882f51) ![image_2024-06-10_21-03-05](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/b3f4bd5a-04ec-4802-aabc-dcaea4882f51)
<br> <br>
<br> <br>
![image_2024-06-10_21-02-10](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/526d72f3-abf2-499c-af18-654025a305ba) ![image_2024-06-10_21-02-10](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/526d72f3-abf2-499c-af18-654025a305ba)
<br> <br>
### Example ### Example
https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/30063999-9ca6-4a86-8721-65e3cba4c87d https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/30063999-9ca6-4a86-8721-65e3cba4c87d
--- ---
# ⚠ Disclaimer ⚠ # ⚠ Disclaimer ⚠
@@ -39,6 +36,7 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/30063999-9ca6-4a8
--- ---
### Recommended Settings ### Recommended Settings
Prompt: Any <br> Prompt: Any <br>
Sampler: dpmpp-3m-sde <br> Sampler: dpmpp-3m-sde <br>
CFG: 7 <br> CFG: 7 <br>
@@ -48,6 +46,7 @@ Duration: Max 47s <br>
Seed: Any <br> Seed: Any <br>
### > Saves Files in the following directory Output/YYYY-MM-DD/ <br> ### > Saves Files in the following directory Output/YYYY-MM-DD/ <br>
### > using the following schema 'your_prompt.mp3' <br> ### > using the following schema 'your_prompt.mp3' <br>
</div> </div>
@@ -63,6 +62,7 @@ Seed: Any <br>
<br> <br>
**Implemented Enhanced Filename Handling and Security Measures** <br> **Implemented Enhanced Filename Handling and Security Measures** <br>
- **Filename Length Control**: Truncated long prompts to a maximum of 50 characters for filenames, preventing excessively long filenames. <br> - **Filename Length Control**: Truncated long prompts to a maximum of 50 characters for filenames, preventing excessively long filenames. <br>
- **Enhanced Sanitization**: Applied strict rules to replace non-alphanumeric characters with underscores (`_`), ensuring valid and safe filenames. <br> - **Enhanced Sanitization**: Applied strict rules to replace non-alphanumeric characters with underscores (`_`), ensuring valid and safe filenames. <br>
- **Unique Filename Generation**: Introduced a system to append numeric suffixes to filenames to avoid overwriting existing files, ensuring each file is uniquely named. <br> - **Unique Filename Generation**: Introduced a system to append numeric suffixes to filenames to avoid overwriting existing files, ensuring each file is uniquely named. <br>
@@ -93,29 +93,28 @@ Seed: Any <br>
✅ Updated UI elements to include Advanced Parametres dropdown <br> ✅ Updated UI elements to include Advanced Parametres dropdown <br>
*( CFG Scale, Sigma_min, Sigma_max )* <br> _( CFG Scale, Sigma_min, Sigma_max )_ <br>
✅ Added Use Half precision checkbox for Low VRAM inference <br> ✅ Added Use Half precision checkbox for Low VRAM inference <br>
*( Float 16 )* _( Float 16 )_
✅ Added choice for all Sampler types <br> ✅ Added choice for all Sampler types <br>
*( dpmpp-3m-sde, dpmpp-2m-sde, k-heun, k-lms, k-dpmpp-2s-ancestral, k-dpm-2, k-dpm-fast )* <br> _( dpmpp-3m-sde, dpmpp-2m-sde, k-heun, k-lms, k-dpmpp-2s-ancestral, k-dpm-2, k-dpm-fast )_ <br>
✅ Added link to the Repo <br> ✅ Added link to the Repo <br>
</details> </details>
--- ---
### 📝 Note: For Windows builds with [Nvidia](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-GPU.bat) 30xx + or Float32 Capable [CPU](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-CPU.bat) you can use the [One-Click-Installer.bat](https://github.com/Saganaki22/StableAudioWebUI/releases/tag/latest) to simplify the process, granted you have logged in to huggingface-cli and auth'd your token prior to running the batch script: Step 3 (the huggingface-cli is used for obtaining the model file) ### 📝 Note: For Windows builds with [Nvidia](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-GPU.bat) 30xx + or Float32 Capable [CPU](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-CPU.bat) you can use the [One-Click-Installer.bat](https://github.com/Saganaki22/StableAudioWebUI/releases/tag/latest) to simplify the process, granted you have logged in to huggingface-cli and auth'd your token prior to running the batch script: Step 3 (the huggingface-cli is used for obtaining the model file)
## Step 1: Start by cloning the repo: ## Step 1: Start by cloning the repo:
git clone https://github.com/Saganaki22/StableAudioWebUI.git git clone https://github.com/Saganaki22/StableAudioWebUI.git
## Step 2: Use the below deployment (tested on 24GB Nvidia VRAM but should work with 12GB too as we have added the Load Half precision, Float16 option in the WebUI): ## Step 2: Use the below deployment (tested on 24GB Nvidia VRAM but should work with 12GB too as we have added the Load Half precision, Float16 option in the WebUI):
cd StableAudioWebUI cd StableAudioWebUI
@@ -124,7 +123,6 @@ Seed: Any <br>
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt pip install -r requirements.txt
## (Note if you have an older Nvidia GPU you may need to use CUDA 11.8) ## (Note if you have an older Nvidia GPU you may need to use CUDA 11.8)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@@ -136,6 +134,7 @@ Step 3: (Optional - read more): If you haven't got a hugging face account or hav
(paste your token and follow the instructions, token will not be displayed when pasted) (paste your token and follow the instructions, token will not be displayed when pasted)
## If you want to run it using CPU <br> ## If you want to run it using CPU <br>
omit 'pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121' and the process after it and just run omit 'pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121' and the process after it and just run
pip install -r requirements1.txt pip install -r requirements1.txt
@@ -143,12 +142,12 @@ Step 3: (Optional - read more): If you haven't got a hugging face account or hav
## Step 4: Run ## Step 4: Run
python gradio_app.py python gradio_app.py
<br> <br>
## ⭐ Bonus ## ⭐ Bonus
If you are using Windows and followed my setup instructions you could create a batch script to activate the enviroment and run the script all in one, what you need to do is: <br> If you are using Windows and followed my setup instructions you could create a batch script to activate the enviroment and run the script all in one, what you need to do is: <br>
<br> <br>
Create a new text file in the same folder as gradio_app.py & paste this in the text file Create a new text file in the same folder as gradio_app.py & paste this in the text file
@@ -165,18 +164,17 @@ then save the file as run.bat
(All with random seeds) <br> (All with random seeds) <br>
Prompt: a dog barking <br> Prompt: a dog barking <br>
CFG: 7 <br> CFG: 7 <br>
Sigma_Min: 0.3 <br> Sigma_Min: 0.3 <br>
Sigma_Max: 500 <br> Sigma_Max: 500 <br>
![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot1.png) <br> ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot1.png) <br>
https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/4ca9eb1b-2808-4f39-b7e3-f35c736eb7b7 https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/4ca9eb1b-2808-4f39-b7e3-f35c736eb7b7
# #
<br> <br>
Prompt: people clapping <br> Prompt: people clapping <br>
CFG: 7 <br> CFG: 7 <br>
@@ -185,13 +183,10 @@ Sigma_Max: 500 <br>
![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot2.png) <br> ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot2.png) <br>
https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1f333384-d4e6-4167-abec-5167e2f4822f https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1f333384-d4e6-4167-abec-5167e2f4822f
# #
<br> <br>
Prompt: didgeridoo <br> Prompt: didgeridoo <br>
CFG: 7 <br> CFG: 7 <br>
@@ -200,12 +195,8 @@ Sigma_Max: 500 <br>
![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot3.png) <br> ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot3.png) <br>
https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a8-ba9a-3a5aa232d43a https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a8-ba9a-3a5aa232d43a
--- ---
## Model Details ## Model Details
@@ -218,6 +209,7 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a
<div align="center"> <div align="center">
# #
![image](https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo-with-title.png) ![image](https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo-with-title.png)
### [Huggingface](https://huggingface.co/stabilityai/stable-audio-open-1.0) | [Stable Audio Tools](https://github.com/Stability-AI/stable-audio-tools) | [Stability AI](https://stability.ai/news/introducing-stable-audio-open) ### [Huggingface](https://huggingface.co/stabilityai/stable-audio-open-1.0) | [Stable Audio Tools](https://github.com/Stability-AI/stable-audio-tools) | [Stability AI](https://stability.ai/news/introducing-stable-audio-open)
@@ -226,5 +218,4 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a
![drbaph](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/13432252-e640-4c98-a7ab-4d57e6b56059) ![drbaph](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/13432252-e640-4c98-a7ab-4d57e6b56059)
</div> </div>

View File

@@ -2,7 +2,10 @@ import torch
import torchaudio import torchaudio
from einops import rearrange from einops import rearrange
from stable_audio_tools import get_pretrained_model from stable_audio_tools import get_pretrained_model
from omegaconf import OmegaConf
from stable_audio_tools.models.factory import create_model_from_config
from stable_audio_tools.inference.generation import generate_diffusion_cond from stable_audio_tools.inference.generation import generate_diffusion_cond
from safetensors.torch import load_file as load_safetensors
from pydub import AudioSegment from pydub import AudioSegment
import re import re
import os import os
@@ -11,12 +14,55 @@ import gradio as gr
# Define a function to toggle the visibility of the seed slider # Define a function to toggle the visibility of the seed slider
def toggle_seed_slider(x): def toggle_seed_slider(x):
seed_slider.visible = not x return gr.Slider(interactive=not x)
# Define a function to set up the model and device # Define a function to set up the model and device
def setup_model(model_half): def setup_model(model_path, model_half):
model, model_config = get_pretrained_model("audo/stable-audio-open-1.0") """
Sets up the model and device.
Args:
model_path (str): Path to a local model .ckpt or .safetensors file. If empty, downloads the default model.
model_half (bool): Whether to use float16 half-precision.
"""
device = "cuda" if torch.cuda.is_available() else "cpu" device = "cuda" if torch.cuda.is_available() else "cpu"
# If no path is provided, or path doesn't exist, download the default model
if not model_path or not os.path.exists(model_path):
if model_path:
print(f"Warning: Model path '{model_path}' not found. Falling back to default model.")
model_id = "audo/stable-audio-open-1.0"
print(f"Loading default model from Hugging Face: {model_id}")
model, model_config = get_pretrained_model(model_id)
# Otherwise, load the model from the local filesystem
else:
print(f"Loading local model from: {model_path}")
# Find the model_config.json file in the same directory as the model
model_dir = os.path.dirname(model_path)
config_path = os.path.join(model_dir, "model_config.json")
if not os.path.exists(config_path):
raise FileNotFoundError(f"Error: Could not find 'model_config.json' in the same directory as the model: {model_dir}")
print(f"Loading model config from: {config_path}")
model_config = OmegaConf.load(config_path)
# Create the model structure from the config
model = create_model_from_config(model_config)
# Load the weights from the checkpoint
if model_path.endswith(".safetensors"):
print("Loading weights from .safetensors file.")
state_dict = load_safetensors(model_path)
elif model_path.endswith(".ckpt"):
print("Loading weights from .ckpt file.")
state_dict = torch.load(model_path, map_location="cpu")["state_dict"]
else:
raise ValueError("Unsupported model file type. Please use .safetensors or .ckpt")
model.load_state_dict(state_dict)
model = model.to(device) model = model.to(device)
# Convert model to float16 if model_half is True # Convert model to float16 if model_half is True
@@ -92,7 +138,10 @@ def generate_audio(prompt, steps, cfg_scale, sigma_min, sigma_max, generation_ti
return full_path return full_path
def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max, generation_time, random_seed, seed, model_half): def audio_generator(prompt, model_path, sampler_type, steps, cfg_scale, sigma_min, sigma_max, generation_time, random_seed, seed, model_half):
"""
Main function called by the Gradio UI to orchestrate audio generation.
"""
try: try:
print("Generating audio with parameters:") print("Generating audio with parameters:")
print("Prompt:", prompt) print("Prompt:", prompt)
@@ -107,7 +156,7 @@ def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max
print("Model Half Precision:", model_half) print("Model Half Precision:", model_half)
# Set up the model and device # Set up the model and device
model, model_config, device = setup_model(model_half) model, model_config, device = setup_model(model_path, model_half)
if random_seed: if random_seed:
seed = torch.randint(0, 1000000, (1,)).item() seed = torch.randint(0, 1000000, (1,)).item()
@@ -118,11 +167,66 @@ def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max
return str(e) return str(e)
# Create Gradio interface # Create Gradio interface
with gr.Blocks() as demo: # with gr.Blocks() as demo:
# gr.Markdown("<h1 style='text-align: center; font-size: 300%;'>💀🔊 StableAudioWebUI 💀🔊</h1>")
# # Main input components
# prompt_textbox = gr.Textbox(lines=5, label="Prompt")
# sampler_dropdown = gr.Dropdown(
# label="Sampler Type",
# choices=[
# "dpmpp-3m-sde",
# "dpmpp-2m-sde",
# "k-heun",
# "k-lms",
# "k-dpmpp-2s-ancestral",
# "k-dpm-2",
# "k-dpm-fast"
# ],
# value="dpmpp-3m-sde"
# )
# steps_slider = gr.Slider(minimum=0, maximum=200, label="Steps", step=1, value=100)
# generation_time_slider = gr.Slider(minimum=0, maximum=47, label="Generation Time (seconds)", step=1, value=47)
# random_seed_checkbox = gr.Checkbox(label="Random Seed")
# seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=123456)
# # Advanced parameters accordion
# with gr.Accordion("Advanced Parameters", open=False):
# cfg_scale_slider = gr.Slider(minimum=0, maximum=15, label="CFG Scale", step=0.1, value=7)
# sigma_min_slider = gr.Slider(minimum=0, maximum=50, label="Sigma Min", step=0.1, value=0.3)
# sigma_max_slider = gr.Slider(minimum=0, maximum=1000, label="Sigma Max", step=0.1, value=500)
# # Low VRAM checkbox and submit button
# model_half_checkbox = gr.Checkbox(label="Low VRAM (float16)", value=False)
# submit_button = gr.Button("Generate")
# # Define the output components
# audio_output = gr.Audio()
# output_textbox = gr.Textbox(label="Output")
# # Link the button and the function
# random_seed_checkbox.change(fn=toggle_seed_slider, inputs=[random_seed_checkbox], outputs=[seed_slider])
# submit_button.click(audio_generator,
# inputs=[prompt_textbox, sampler_dropdown, steps_slider, cfg_scale_slider,sigma_min_slider, sigma_max_slider, generation_time_slider, random_seed_checkbox, seed_slider, model_half_checkbox],
# outputs=[audio_output, output_textbox])
# # GitHub link at the bottom
# gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI'>Github Repository</a></p>")
with gr.Blocks(theme=gr.themes.Soft()) as demo:
gr.Markdown("<h1 style='text-align: center; font-size: 300%;'>💀🔊 StableAudioWebUI 💀🔊</h1>") gr.Markdown("<h1 style='text-align: center; font-size: 300%;'>💀🔊 StableAudioWebUI 💀🔊</h1>")
with gr.Row():
with gr.Column(scale=2):
# Main input components # Main input components
prompt_textbox = gr.Textbox(lines=5, label="Prompt") prompt_textbox = gr.Textbox(lines=5, label="Prompt", placeholder="A beautiful orchestral piece with violins, piano, and a choir...")
# NEW: Textbox for local model path
model_path_textbox = gr.Textbox(
label="Local Model Path (Optional)",
placeholder="e.g., /home/user/models/stable-audio-open-1.0.ckpt. Leave blank for default."
)
sampler_dropdown = gr.Dropdown( sampler_dropdown = gr.Dropdown(
label="Sampler Type", label="Sampler Type",
choices=[ choices=[
@@ -136,33 +240,54 @@ with gr.Blocks() as demo:
], ],
value="dpmpp-3m-sde" value="dpmpp-3m-sde"
) )
steps_slider = gr.Slider(minimum=0, maximum=200, label="Steps", step=1, value=100)
generation_time_slider = gr.Slider(minimum=0, maximum=47, label="Generation Time (seconds)", step=1, value=47) with gr.Row():
random_seed_checkbox = gr.Checkbox(label="Random Seed") steps_slider = gr.Slider(minimum=10, maximum=200, label="Steps", step=1, value=100)
seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=123456) generation_time_slider = gr.Slider(minimum=1, maximum=47, label="Generation Time (seconds)", step=1, value=47)
with gr.Row():
random_seed_checkbox = gr.Checkbox(label="Random Seed", value=True)
seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=12345, interactive=False)
# Advanced parameters accordion # Advanced parameters accordion
with gr.Accordion("Advanced Parameters", open=False): with gr.Accordion("Advanced Parameters", open=False):
cfg_scale_slider = gr.Slider(minimum=0, maximum=15, label="CFG Scale", step=0.1, value=7) cfg_scale_slider = gr.Slider(minimum=0, maximum=25, label="CFG Scale", step=0.1, value=7)
sigma_min_slider = gr.Slider(minimum=0, maximum=50, label="Sigma Min", step=0.1, value=0.3) sigma_min_slider = gr.Slider(minimum=0.01, maximum=50, label="Sigma Min", step=0.01, value=0.3)
sigma_max_slider = gr.Slider(minimum=0, maximum=1000, label="Sigma Max", step=0.1, value=500) sigma_max_slider = gr.Slider(minimum=1, maximum=1000, label="Sigma Max", step=1, value=500)
# Low VRAM checkbox and submit button # Low VRAM checkbox and submit button
model_half_checkbox = gr.Checkbox(label="Low VRAM (float16)", value=False) model_half_checkbox = gr.Checkbox(label="Low VRAM (float16)", value=False)
submit_button = gr.Button("Generate") submit_button = gr.Button("Generate", variant="primary")
with gr.Column(scale=1):
# Define the output components # Define the output components
audio_output = gr.Audio() audio_output = gr.Audio(label="Generated Audio")
output_textbox = gr.Textbox(label="Output") output_textbox = gr.Textbox(label="Status", interactive=False)
# Link the button and the function # Link the button and the function
random_seed_checkbox.change(fn=toggle_seed_slider, inputs=[random_seed_checkbox], outputs=[seed_slider]) random_seed_checkbox.change(fn=toggle_seed_slider, inputs=[random_seed_checkbox], outputs=[seed_slider])
submit_button.click(audio_generator,
inputs=[prompt_textbox, sampler_dropdown, steps_slider, cfg_scale_slider,sigma_min_slider, sigma_max_slider, generation_time_slider, random_seed_checkbox, seed_slider, model_half_checkbox], # MODIFIED: Added model_path_textbox to the list of inputs
outputs=[audio_output, output_textbox]) submit_button.click(
fn=audio_generator,
inputs=[
prompt_textbox,
model_path_textbox,
sampler_dropdown,
steps_slider,
cfg_scale_slider,
sigma_min_slider,
sigma_max_slider,
generation_time_slider,
random_seed_checkbox,
seed_slider,
model_half_checkbox
],
outputs=[audio_output, output_textbox]
)
# GitHub link at the bottom # GitHub link at the bottom
gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI'>Github Repository</a></p>") gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI' target='_blank'>Github Repository</a></p>")
# Launch the Gradio demo # Launch the Gradio demo
demo.launch() demo.launch()

17
pyproject.toml Normal file
View File

@@ -0,0 +1,17 @@
[project]
name = "stableaudiowebui"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"einops>=0.8.1",
"gradio>=5.43.1",
"numba>=0.58",
"omegaconf>=2.3.0",
"pydub>=0.25.1",
"stable-audio-tools>=0.0.19",
"torch>=2.8.0",
"torchaudio>=2.8.0",
"torchvision>=0.23.0",
]

4062
uv.lock generated Normal file

File diff suppressed because it is too large Load Diff