The Shocking Truth About OpenStack Cinder QoS Limitations

If you operate OpenStack seriously, block storage performance is not an afterthought.
It is a contract — between users, the scheduler, and the storage backend.

And yet, if you try to automate OpenStack Cinder QoS with Ansible today, you quickly run into an uncomfortable truth:

Ansible does not support Cinder QoS specs.

This is not a missing module waiting to be written. It’s a structural signal about where Ansible fits — and where it doesn’t — in OpenStack block storage automation.

The more interesting question is what happens when you stop trying to automate QoS through Cinder at all.


What QoS Means in This Context

In the context of block storage, Quality of Service (QoS) refers to the ability to control and guarantee storage performance characteristics — typically IOPS, bandwidth, and sometimes latency — for a given volume or workload. QoS exists to prevent noisy neighbors, enforce performance expectations, and align storage behavior with application requirements.

In OpenStack Cinder, QoS is expressed as intent via metadata and scheduler hints. On storage backends, QoS is enforced behavior, implemented directly by the array.


Why QoS Matters More Than Volume Types

In many OpenStack environments, volume types are treated as the primary abstraction:
gold, silver, bronze.

But volume types are labels.
QoS specs are where intent is supposed to become enforceable behavior.

QoS specs:

  • Influence scheduler placement
  • Encode performance expectations
  • Attempt to map OpenStack intent to backend enforcement

When QoS is wrong, nothing fails loudly. Performance just degrades — often long after the configuration change that caused it.

That makes QoS both operationally critical and easy to misuse.


The Reality: Ansible Cannot Manage Cinder QoS

The openstack.cloud Ansible collection provides modules for volumes, volume types, snapshots, and attachments.

What it does not provide:

  • QoS spec creation
  • QoS modification
  • QoS association
  • QoS validation

This is not an oversight.

Cinder QoS is API-level intent, tightly coupled to scheduler behavior and implemented very differently across storage drivers. Modeling that safely in Ansible’s idempotent task model would require guarantees Cinder does not — and cannot — provide.


Why Wrapping the OpenStack CLI Is the Wrong Answer

The usual workaround looks something like this:

- name: Create QoS spec
  command: >
    openstack volume qos create gold
    --property maxIOPS=10000

This works right up until it doesn’t.

Problems appear quickly:

  • No real idempotency
  • Fragile output parsing
  • Poor error semantics
  • No visibility into backend enforcement

Most importantly, this automates mutation without validation. For block storage, that’s a dangerous trade.


Where the Abstraction Breaks

Cinder QoS expresses intent.
Storage arrays enforce reality.

Cinder does not throttle I/O.
Cinder does not enforce latency.
Cinder does not guarantee performance.

Those things happen on the storage backend. Once you accept that, the shape of a safer automation model becomes obvious.


Retype Is the Only Way to Change QoS in Cinder

In practice, the situation is simple and often misunderstood:

The only supported way to change QoS behavior for an existing Cinder volume is to perform a volume retype.

Key implications:

  • QoS is effectively immutable for an existing volume
  • Modifying a QoS spec on an in-use volume type does not reliably affect existing volumes
  • Retypes may involve migration, backend data movement, and scheduler re-evaluation

Because of this, retype operations:

  • Are not instantaneous
  • Frequently require application pauses
  • Are unsuitable for live performance tuning

Even when “in-place retype” is advertised, pausing applications is often the only safe way to avoid I/O stalls or latency spikes.

From an automation perspective, this is a hard boundary. Cinder QoS is a provisioning-time decision, not a runtime control.


FlashArray: Enforcing QoS Where It Actually Lives

If you are using Pure Storage FlashArray, the automation story changes in an important way.

The FlashArray Ansible collection allows you to:

  • Directly control per-volume QoS
  • Query the actual enforced limits
  • Modify performance settings idempotently
  • Validate backend state after every change

All without:

  • Using the OpenStack CLI
  • Calling the Cinder API
  • Relying on scheduler metadata

You are no longer automating an abstraction. You are automating the enforcement layer itself.


Dynamic QoS Changes, No Application Impact

This is the critical operational difference.

With FlashArray, QoS changes:

  • Are applied dynamically
  • Do not require detaching volumes
  • Do not interrupt I/O
  • Have no impact on running applications

This turns QoS from a static guess made at provisioning time into a live operational control.

Workloads change. Incidents happen. Noisy neighbors appear. With backend-enforced QoS, automation can respond in real time instead of waiting for a maintenance window.


A Day-2 Incident Scenario

Imagine a production database VM experiencing latency spikes during peak business hours.

Cinder-centric reality:

  • QoS is tied to a volume type
  • Changing it affects every volume of that type
  • Retype may require migration and application pauses
  • The fix gets deferred until after hours

FlashArray + Ansible reality:

  • Identify the backing volume
  • Increase IOPS or bandwidth limits dynamically
  • Observe latency normalize immediately
  • Roll back later if needed, with no downtime

That difference is not academic. It’s the difference between configuration automation and operational automation.


Minimal FlashArray Ansible Example

- name: Adjust QoS for a live Cinder-backed volume
  purestorage.flasharray.purefa_volume:
    name: volume-1234
    iops_qos: 20000
    bw_qos: 500M
    state: present

This task is:

  • Safe to rerun
  • Applied immediately
  • Backend-enforced
  • Non-disruptive

This is what idempotency actually looks like for storage performance.


A Safer Automation Model

In mature OpenStack environments, the split is clean:

  • Cinder owns lifecycle
    Volume creation, attachment, and scheduling
  • The storage backend owns performance
    Enforcement, validation, and dynamic adjustment
  • Ansible orchestrates
    Preconditions, guardrails, and controlled mutation

Each layer owns what it can actually guarantee.


A Note on OpenStack 2025.2 (Flamingo)

Support for per-GB backend QoS with Pure Storage FlashArray was introduced in OpenStack 2025.2 (Flamingo). This allows FlashArray-backed Cinder volumes to derive performance limits based on volume size, enabling more consistent and predictable performance characteristics at provisioning time.

However, this does not change the fundamental behavior of Cinder QoS. OpenStack does not provide a mechanism to dynamically modify the base per-volume or per-GB QoS settings of an existing Cinder volume after it has been created.

Once a volume exists, those QoS characteristics remain static. Changing them still requires a volume retype, with the same operational implications — including potential migration, backend data movement, and possible application pauses.

In short, per-GB QoS improves how performance is derived at creation time, but Cinder QoS remains a provisioning-time construct, not a runtime control plane.


Final Thoughts

Cinder QoS is about intent, not live enforcement.
Retype is the escape hatch — and it comes with downtime risk.

If you want dynamic, non-disruptive QoS changes for running workloads, automation has to operate where performance is actually enforced.

The absence of Cinder QoS modules in Ansible isn’t a gap to paper over. It’s a signal telling you where automation belongs.

With the right storage platform, Ansible can manage QoS safely, dynamically, and without impacting applications — just not at the Cinder layer.

Leave a Reply

Your email address will not be published. Required fields are marked *