StableAudioWebUI

A Lightweight Gradio Web interface for running Stable Audio Open 1.0

Saves Files in the following directory Output/YYYY-MM-DD/
with the following schema 'prompt.mp3'

Recommended Settings

Prompt: Any
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500
Duration: Max 47s
Seed: Any

Start by cloning the repo:

git clone https://github.com/Saganaki22/StableAudioWebUI.git

Use the below deployment (tested on 24GB Nvidia VRAM):

cd StableAudioWebUI
conda create -n saowebui python=3.10
conda activate saowebui
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

(Note if you have an older Nvidia GPU you may need to use CUDA 11.8)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

If you haven't got a hugging face account or have not used huggingface-cli before, create an account and then authenticate your Hugging face account with a token (create token at https://huggingface.co/settings/tokens)

huggingface-cli login

(paste your token and follow the instructions, token will not be displayed when pasted)

⚠ If you want to run it using CPU

skip 'pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121' and just run

pip install -r requirements.txt
pip install -r requirements1.txt

Run

python gradio_app.py

Screenshots

(All with random seeds)

Prompt: a dog barking
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

Prompt: people clapping
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

Prompt: didgeridoo
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

Model Details

Model type: Stable Audio Open 1.0 is a latent diffusion model based on a transformer architecture.
Language(s): English
License: See the LICENSE file.
Commercial License: to use this model commercially, please refer to https://stability.ai/membership

2.7 KiB Raw Blame History