Refactor code structure for improved readability and maintainability

2025-08-24 12:05:41 +02:00
parent 34f76242e6
commit c5ceea27b4
5 changed files with 4276 additions and 80 deletions
--- a/.python-version
+++ b/.python-version
@@ -0,0 +1 @@
 3.11
--- a/README.md
+++ b/README.md
@@ -1,35 +1,32 @@
 <div align="center">
 MY FORK OF
 (modified to use uv and local models)
 # 💀🔊 StableAudioWebUI 💀🔊
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-red)](https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0)
 ### A Lightweight Gradio Web interface for running Stable Audio Open 1.0
 By *[@drbaph](https://instagram.com/drbaph)*
 By _[@drbaph](https://instagram.com/drbaph)_
 <br>
 <br>
 ![image_2024-06-10_21-03-05](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/b3f4bd5a-04ec-4802-aabc-dcaea4882f51)
 <br>
 <br>
 ![image_2024-06-10_21-02-10](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/526d72f3-abf2-499c-af18-654025a305ba)
 <br>
 ### Example
 https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/30063999-9ca6-4a86-8721-65e3cba4c87d
 ---
 # ⚠ Disclaimer ⚠
@@ -39,6 +36,7 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/30063999-9ca6-4a8
 ---
 ### Recommended Settings
 Prompt: Any <br>
 Sampler: dpmpp-3m-sde <br>
 CFG: 7 <br>
@@ -48,6 +46,7 @@ Duration: Max 47s <br>
 Seed: Any <br>
 ### > Saves Files in the following directory Output/YYYY-MM-DD/ <br>
 ### > using the following schema 'your_prompt.mp3' <br>
 </div>
@@ -63,6 +62,7 @@ Seed: Any <br>
 <br>
 ✅ **Implemented Enhanced Filename Handling and Security Measures** <br>
 - **Filename Length Control**: Truncated long prompts to a maximum of 50 characters for filenames, preventing excessively long filenames. <br>
 - **Enhanced Sanitization**: Applied strict rules to replace non-alphanumeric characters with underscores (`_`), ensuring valid and safe filenames. <br>
 - **Unique Filename Generation**: Introduced a system to append numeric suffixes to filenames to avoid overwriting existing files, ensuring each file is uniquely named. <br>
@@ -93,29 +93,28 @@ Seed: Any <br>
 ✅ Updated UI elements to include Advanced Parametres dropdown <br>
-*( CFG Scale, Sigma_min, Sigma_max )* <br>
+_( CFG Scale, Sigma_min, Sigma_max )_ <br>
 ✅ Added Use Half precision checkbox for Low VRAM inference <br>
-*( Float 16 )*
+_( Float 16 )_
 ✅ Added choice for all Sampler types <br>
-*( dpmpp-3m-sde, dpmpp-2m-sde, k-heun, k-lms, k-dpmpp-2s-ancestral, k-dpm-2, k-dpm-fast )* <br>
+_( dpmpp-3m-sde, dpmpp-2m-sde, k-heun, k-lms, k-dpmpp-2s-ancestral, k-dpm-2, k-dpm-fast )_ <br>
 ✅ Added link to the Repo <br>
 </details>
 ---
 ### 📝 Note: For Windows builds with [Nvidia](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-GPU.bat) 30xx + or Float32 Capable [CPU](https://github.com/Saganaki22/StableAudioWebUI/releases/download/latest/One-Click-Installer-CPU.bat) you can use the [One-Click-Installer.bat](https://github.com/Saganaki22/StableAudioWebUI/releases/tag/latest) to simplify the process, granted you have logged in to huggingface-cli and auth'd your token prior to running the batch script: Step 3 (the huggingface-cli is used for obtaining the model file)
 ## Step 1: Start by cloning the repo:
    git clone https://github.com/Saganaki22/StableAudioWebUI.git
 ## Step 2: Use the below deployment (tested on 24GB Nvidia VRAM but should work with 12GB too as we have added the Load Half precision, Float16 option in the WebUI):
    cd StableAudioWebUI
@@ -124,7 +123,6 @@ Seed: Any <br>
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    pip install -r requirements.txt
 ## (Note if you have an older Nvidia GPU you may need to use CUDA 11.8)
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@@ -136,6 +134,7 @@ Step 3: (Optional - read more): If you haven't got a hugging face account or hav
 (paste your token and follow the instructions, token will not be displayed when pasted)
 ## If you want to run it using CPU <br>
 omit 'pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121' and the process after it and just run
    pip install -r requirements1.txt
@@ -143,12 +142,12 @@ Step 3: (Optional - read more): If you haven't got a hugging face account or hav
 ## Step 4: Run
    python gradio_app.py
 <br>
 ## ⭐ Bonus
 If you are using Windows and followed my setup instructions you could create a batch script to activate the enviroment and run the script all in one, what you need to do is: <br>
 <br>
 Create a new text file in the same folder as gradio_app.py & paste this in the text file
@@ -165,18 +164,17 @@ then save the file as run.bat
 (All with random seeds) <br>
 Prompt: a dog barking <br>
 CFG: 7 <br>
 Sigma_Min: 0.3 <br>
 Sigma_Max: 500 <br>
 ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot1.png) <br>
 https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/4ca9eb1b-2808-4f39-b7e3-f35c736eb7b7
 #
 <br>
 Prompt: people clapping <br>
 CFG: 7 <br>
@@ -185,13 +183,10 @@ Sigma_Max: 500 <br>
 ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot2.png) <br>
 https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1f333384-d4e6-4167-abec-5167e2f4822f
 #
 <br>
 Prompt: didgeridoo <br>
 CFG: 7 <br>
@@ -200,12 +195,8 @@ Sigma_Max: 500 <br>
 ![image](https://github.com/Saganaki22/StableAudioWebUI/blob/main/assets/screenshot3.png) <br>
 https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a8-ba9a-3a5aa232d43a
 ---
 ## Model Details
@@ -218,6 +209,7 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a
 <div align="center">
 #
 ![image](https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo-with-title.png)
 ### [Huggingface](https://huggingface.co/stabilityai/stable-audio-open-1.0) | [Stable Audio Tools](https://github.com/Stability-AI/stable-audio-tools) | [Stability AI](https://stability.ai/news/introducing-stable-audio-open)
@@ -226,5 +218,4 @@ https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/1cb7ce3b-7463-46a
 ![drbaph](https://github.com/Saganaki22/StableAudioWebUI/assets/84208527/13432252-e640-4c98-a7ab-4d57e6b56059)
 </div>
--- a/gradio_app.py
+++ b/gradio_app.py
@@ -2,7 +2,10 @@ import torch
 import torchaudio
 from einops import rearrange
 from stable_audio_tools import get_pretrained_model
 from omegaconf import OmegaConf
 from stable_audio_tools.models.factory import create_model_from_config
 from stable_audio_tools.inference.generation import generate_diffusion_cond
 from safetensors.torch import load_file as load_safetensors
 from pydub import AudioSegment
 import re
 import os
@@ -11,12 +14,55 @@ import gradio as gr
 # Define a function to toggle the visibility of the seed slider
 def toggle_seed_slider(x):
-    seed_slider.visible = not x
+    return gr.Slider(interactive=not x)
 # Define a function to set up the model and device
-def setup_model(model_half):
+def setup_model(model_path, model_half):
-    model, model_config = get_pretrained_model("audo/stable-audio-open-1.0")
+    """
    Sets up the model and device.
    Args:
        model_path (str): Path to a local model .ckpt or .safetensors file. If empty, downloads the default model.
        model_half (bool): Whether to use float16 half-precision.
    """
    device = "cuda" if torch.cuda.is_available() else "cpu"
    # If no path is provided, or path doesn't exist, download the default model
    if not model_path or not os.path.exists(model_path):
        if model_path:
             print(f"Warning: Model path '{model_path}' not found. Falling back to default model.")
        model_id = "audo/stable-audio-open-1.0"
        print(f"Loading default model from Hugging Face: {model_id}")
        model, model_config = get_pretrained_model(model_id)
    # Otherwise, load the model from the local filesystem
    else:
        print(f"Loading local model from: {model_path}")
        # Find the model_config.json file in the same directory as the model
        model_dir = os.path.dirname(model_path)
        config_path = os.path.join(model_dir, "model_config.json")
        if not os.path.exists(config_path):
            raise FileNotFoundError(f"Error: Could not find 'model_config.json' in the same directory as the model: {model_dir}")
        print(f"Loading model config from: {config_path}")
        model_config = OmegaConf.load(config_path)
        # Create the model structure from the config
        model = create_model_from_config(model_config)
        # Load the weights from the checkpoint
        if model_path.endswith(".safetensors"):
            print("Loading weights from .safetensors file.")
            state_dict = load_safetensors(model_path)
        elif model_path.endswith(".ckpt"):
            print("Loading weights from .ckpt file.")
            state_dict = torch.load(model_path, map_location="cpu")["state_dict"]
        else:
            raise ValueError("Unsupported model file type. Please use .safetensors or .ckpt")
        model.load_state_dict(state_dict)
    model = model.to(device)
    # Convert model to float16 if model_half is True
@@ -92,7 +138,10 @@ def generate_audio(prompt, steps, cfg_scale, sigma_min, sigma_max, generation_ti
    return full_path
-def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max, generation_time, random_seed, seed, model_half):
+def audio_generator(prompt, model_path, sampler_type, steps, cfg_scale, sigma_min, sigma_max, generation_time, random_seed, seed, model_half):
    """
    Main function called by the Gradio UI to orchestrate audio generation.
    """
    try:
        print("Generating audio with parameters:")
        print("Prompt:", prompt)
@@ -107,7 +156,7 @@ def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max
        print("Model Half Precision:", model_half)
        # Set up the model and device
-        model, model_config, device = setup_model(model_half)
+        model, model_config, device = setup_model(model_path, model_half)
        if random_seed:
            seed = torch.randint(0, 1000000, (1,)).item()
@@ -118,11 +167,66 @@ def audio_generator(prompt, sampler_type, steps, cfg_scale, sigma_min, sigma_max
        return str(e)
 # Create Gradio interface
-with gr.Blocks() as demo:
+# with gr.Blocks() as demo:
 #     gr.Markdown("<h1 style='text-align: center; font-size: 300%;'>💀🔊 StableAudioWebUI 💀🔊</h1>")
 #     # Main input components
 #     prompt_textbox = gr.Textbox(lines=5, label="Prompt")
 #     sampler_dropdown = gr.Dropdown(
 #         label="Sampler Type",
 #         choices=[
 #             "dpmpp-3m-sde",
 #             "dpmpp-2m-sde",
 #             "k-heun",
 #             "k-lms",
 #             "k-dpmpp-2s-ancestral",
 #             "k-dpm-2",
 #             "k-dpm-fast"
 #         ],
 #         value="dpmpp-3m-sde"
 #     )
 #     steps_slider = gr.Slider(minimum=0, maximum=200, label="Steps", step=1, value=100)
 #     generation_time_slider = gr.Slider(minimum=0, maximum=47, label="Generation Time (seconds)", step=1, value=47)
 #     random_seed_checkbox = gr.Checkbox(label="Random Seed")
 #     seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=123456)
 #     # Advanced parameters accordion
 #     with gr.Accordion("Advanced Parameters", open=False):
 #         cfg_scale_slider = gr.Slider(minimum=0, maximum=15, label="CFG Scale", step=0.1, value=7)
 #         sigma_min_slider = gr.Slider(minimum=0, maximum=50, label="Sigma Min", step=0.1, value=0.3)
 #         sigma_max_slider = gr.Slider(minimum=0, maximum=1000, label="Sigma Max", step=0.1, value=500)
 #     # Low VRAM checkbox and submit button
 #     model_half_checkbox = gr.Checkbox(label="Low VRAM (float16)", value=False)
 #     submit_button = gr.Button("Generate")
 #     # Define the output components
 #     audio_output = gr.Audio()
 #     output_textbox = gr.Textbox(label="Output")
 #     # Link the button and the function
 #     random_seed_checkbox.change(fn=toggle_seed_slider, inputs=[random_seed_checkbox], outputs=[seed_slider])
 #     submit_button.click(audio_generator,
 #                         inputs=[prompt_textbox, sampler_dropdown, steps_slider, cfg_scale_slider,sigma_min_slider, sigma_max_slider, generation_time_slider, random_seed_checkbox, seed_slider, model_half_checkbox],
 #                         outputs=[audio_output, output_textbox])
 #     # GitHub link at the bottom
 #     gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI'>Github Repository</a></p>")
 with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("<h1 style='text-align: center; font-size: 300%;'>💀🔊 StableAudioWebUI 💀🔊</h1>")
    with gr.Row():
        with gr.Column(scale=2):
            # Main input components
-    prompt_textbox = gr.Textbox(lines=5, label="Prompt")
+            prompt_textbox = gr.Textbox(lines=5, label="Prompt", placeholder="A beautiful orchestral piece with violins, piano, and a choir...")
            # NEW: Textbox for local model path
            model_path_textbox = gr.Textbox(
                label="Local Model Path (Optional)",
                placeholder="e.g., /home/user/models/stable-audio-open-1.0.ckpt. Leave blank for default."
            )
            sampler_dropdown = gr.Dropdown(
                label="Sampler Type",
                choices=[
@@ -136,33 +240,54 @@ with gr.Blocks() as demo:
                ],
                value="dpmpp-3m-sde"
            )
-    steps_slider = gr.Slider(minimum=0, maximum=200, label="Steps", step=1, value=100)
+            
-    generation_time_slider = gr.Slider(minimum=0, maximum=47, label="Generation Time (seconds)", step=1, value=47)
+            with gr.Row():
-    random_seed_checkbox = gr.Checkbox(label="Random Seed")
+                steps_slider = gr.Slider(minimum=10, maximum=200, label="Steps", step=1, value=100)
-    seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=123456)
+                generation_time_slider = gr.Slider(minimum=1, maximum=47, label="Generation Time (seconds)", step=1, value=47)
            with gr.Row():
                random_seed_checkbox = gr.Checkbox(label="Random Seed", value=True)
                seed_slider = gr.Slider(minimum=-1, maximum=999999, label="Seed", step=1, value=12345, interactive=False)
            # Advanced parameters accordion
            with gr.Accordion("Advanced Parameters", open=False):
-        cfg_scale_slider = gr.Slider(minimum=0, maximum=15, label="CFG Scale", step=0.1, value=7)
+                cfg_scale_slider = gr.Slider(minimum=0, maximum=25, label="CFG Scale", step=0.1, value=7)
-        sigma_min_slider = gr.Slider(minimum=0, maximum=50, label="Sigma Min", step=0.1, value=0.3)
+                sigma_min_slider = gr.Slider(minimum=0.01, maximum=50, label="Sigma Min", step=0.01, value=0.3)
-        sigma_max_slider = gr.Slider(minimum=0, maximum=1000, label="Sigma Max", step=0.1, value=500)
+                sigma_max_slider = gr.Slider(minimum=1, maximum=1000, label="Sigma Max", step=1, value=500)
            # Low VRAM checkbox and submit button
            model_half_checkbox = gr.Checkbox(label="Low VRAM (float16)", value=False)
-    submit_button = gr.Button("Generate")
+            submit_button = gr.Button("Generate", variant="primary")
        with gr.Column(scale=1):
            # Define the output components
-    audio_output = gr.Audio()
+            audio_output = gr.Audio(label="Generated Audio")
-    output_textbox = gr.Textbox(label="Output")
+            output_textbox = gr.Textbox(label="Status", interactive=False)
    # Link the button and the function
    random_seed_checkbox.change(fn=toggle_seed_slider, inputs=[random_seed_checkbox], outputs=[seed_slider])
-    submit_button.click(audio_generator,
+    
-                        inputs=[prompt_textbox, sampler_dropdown, steps_slider, cfg_scale_slider,sigma_min_slider, sigma_max_slider, generation_time_slider, random_seed_checkbox, seed_slider, model_half_checkbox],
+    # MODIFIED: Added model_path_textbox to the list of inputs
-                        outputs=[audio_output, output_textbox])
+    submit_button.click(
        fn=audio_generator,
        inputs=[
            prompt_textbox, 
            model_path_textbox,
            sampler_dropdown, 
            steps_slider, 
            cfg_scale_slider,
            sigma_min_slider, 
            sigma_max_slider, 
            generation_time_slider, 
            random_seed_checkbox, 
            seed_slider, 
            model_half_checkbox
        ],
        outputs=[audio_output, output_textbox]
    )
    # GitHub link at the bottom
-    gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI'>Github Repository</a></p>")
+    gr.Markdown("<p style='text-align: center;'><a href='https://github.com/Saganaki22/StableAudioWebUI' target='_blank'>Github Repository</a></p>")
 # Launch the Gradio demo
 demo.launch()
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,17 @@
 [project]
 name = "stableaudiowebui"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
    "einops>=0.8.1",
    "gradio>=5.43.1",
    "numba>=0.58",
    "omegaconf>=2.3.0",
    "pydub>=0.25.1",
    "stable-audio-tools>=0.0.19",
    "torch>=2.8.0",
    "torchaudio>=2.8.0",
    "torchvision>=0.23.0",
 ]
--- a/uv.lock
+++ b/uv.lock