Files
StableAudioWebUI/README.md
2024-06-06 08:18:26 +01:00

2.7 KiB

StableAudioWebUI

A Lightweight Gradio Web interface for running Stable Audio Open 1.0



image



Saves Files in the following directory Output/YYYY-MM-DD/
with the following schema 'prompt.mp3'

Prompt: Any
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500
Duration: Max 47s
Seed: Any


Start by cloning the repo:

git clone https://github.com/Saganaki22/StableAudioWebUI.git

Use the below deployment (tested on 24GB Nvidia VRAM):
cd StableAudioWebUI
conda create -n saowebui python=3.10
conda activate saowebui
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

(Note if you have an older Nvidia GPU you may need to use CUDA 11.8)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

If you haven't got a hugging face account or have not used huggingface-cli before, create an account and then authenticate your Hugging face account with a token (create token at https://huggingface.co/settings/tokens)
huggingface-cli login

(paste your token and follow the instructions, token will not be displayed when pasted)

⚠ If you want to run it using CPU

skip 'pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121' and just run

pip install -r requirements.txt
pip install -r requirements1.txt

Run

python gradio_app.py

Screenshots

(All with random seeds)

Prompt: a dog barking
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

image


Prompt: people clapping
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

image


Prompt: didgeridoo
CFG: 7
Sigma_Min: 0.3
Sigma_Max: 500

image


Model Details

  • Model typeStable Audio Open 1.0 is a latent diffusion model based on a transformer architecture.
  • Language(s): English
  • License: See the LICENSE file.
  • Commercial License: to use this model commercially, please refer to https://stability.ai/membership

Huggingface | Stable Audio Tools | Stability AI