TLDR: ATHENS, Greece—Stability AI released Stability Audio 3.0, extending AI song generation to 6 minutes 20 seconds and reshaping licensing and access.
Key Takeaways:
- Stability AI follows prior releases: Stable Audio Open capped at 47 seconds, and new models push toward full compositions.
- The Stability Audio 3.0 suite includes small SFX and small for up to 2 minutes, plus medium and large for 6 minutes 20 seconds.
- Open weights for small SFX, small, and medium raise adoption, while the large model remains API and paid, tightening licensing leverage.
- Warner Music Group and Universal Music Group deals and claims of fully licensed data signal the industry shift after ongoing labeling fights.
Six minutes 20 seconds is the kind of target that makes AI feel less like a novelty and more like a rival workflow. Now the real contest becomes who can clear licensing at scale, not just who can generate the notes.
Six minutes 20 seconds is the kind of target that makes AI feel less like a novelty and more like a rival workflow. Now the real contest becomes who can clear licensing at scale, not just who can generate the notes.
Q&A
If small and medium models are open weights, what stops them from being used in ways that trigger new licensing disputes?
Stability AI can publish open weights while still expecting downstream compliance. The weak link is enforcement across third party users and fine tuned datasets.
Will the move from short clips to 6 minute compositions change how music labels evaluate risk?
Longer outputs create bigger synchronization and distribution overlap. That can push labels toward clearer rules for training data, similarity, and commercial use.
How might API only access for the large model influence competition with Suno and Udio?
API gating can slow some experiments but helps Stability control quality, usage terms, and auditing. In fast markets, control can be a tradeoff with momentum.
Why does Stability hire a pro music executive as the product evolves?
Music teams translate capabilities into rights safe workflows, licensing language, and partner value. It is a clue that distribution and legitimacy matter as much as audio quality.
What happens to creators who already use 47 second tools, now that templates, arrangement, and structure can extend much further?
They can move from loop making to full session drafts, which increases output volume and raises stakes for attribution, originality claims, and platform policies.
No comments yet. Be the first to share your thoughts!