OpenStack Cinder Scheduling: How to Stop Hard-Coding Storage Tiers

From Hardware Names to Workload Intent

The Cinder scheduler doesn’t get much love. Most people see it as boring infrastructure plumbing. But when you lean into its design, something interesting happens: it becomes a decision engine that quietly translates intent into placement, without hard-coding tiers or baking policy into application logic.

This is the first post in a three-part series on using Cinder volume types, capabilities, and filters to let the scheduler make smart, explainable choices across multiple storage backends. The examples use Pure Storage FlashArray platforms, but the ideas generalize to any driver that reports rich stats.

In this post, I’ll cover the fundamentals: how to describe your storage infrastructure, express workload requirements, and let the scheduler do the matching. In Part 2, I’ll explore a powerful scaling pattern using multiple arrays with a single backend name and in Part 3 I’ll examine the final frontier of volume scaling; that of volume migration.

The Core Idea: Describe Reality, Then Express Intent

Cinder scheduling works best when two things are true:

  1. Backends describe what they are — their capacity, performance characteristics, and features
  2. Volume types describe what workloads want — the requirements and preferences

The scheduler’s job is simple: reconcile the two. CapacityFilter and CapabilitiesFilter prune the search space. Weighers break ties. Nothing magical—just disciplined metadata.

Two Arrays, Two Personalities

Let’s start with a concrete scenario. Imagine you have two backends:

  • A FlashArray//X tuned for performance
  • A FlashArray//C optimized for capacity and efficiency

Both export standard Cinder stats like free capacity, thin provisioning support, and oversubscription ratios. On top of that, the driver surfaces (or you inject) performance clues: latency, throughput, dedupe ratio, maximum volume size.

The result? Each backend has a recognizable profile. The //X is fast and tight. The //C is spacious and slower, but still predictable. The scheduler doesn’t guess—it evaluates.

Declaring Backend Capabilities

In cinder.conf, each backend is defined normally, but with extra capabilities added to make scheduling decisions explicit. Here’s a simplified example:

[pure_perf]
volume_backend_name = pure_perf
volume_driver = cinder.volume.drivers.pure.PureISCSIDriver
san_ip = 10.1.0.10
pure_api_token = <secret>
use_chap_auth = False

# Driver-reported capacity stats are automatic.
# Custom capabilities provide scheduling clarity.
extra_capabilities = {
  "array_model": "FlashArray//X",
  "avg_latency_ms": 0.35,
  "avg_throughput_mbps": 5200,
  "dedupe_ratio": 4.5,
  "compression_ratio": 1.7,
  "max_volume_size_gb": 8000
}

[pure_capacity]
volume_backend_name = pure_capacity
volume_driver = cinder.volume.drivers.pure.PureISCSIDriver
san_ip = 10.1.0.20
pure_api_token = <secret>
use_chap_auth = False
extra_capabilities = {
  "array_model": "FlashArray//C",
  "avg_latency_ms": 1.1,
  "avg_throughput_mbps": 2100,
  "dedupe_ratio": 3.2,
  "compression_ratio": 1.4,
  "max_volume_size_gb": 32000
}

These values don’t replace the dynamic statistics reported by the driver—they complement them by giving the scheduler stable semantic hooks.

Driver-Reported Metrics: Beyond Static Capabilities

While extra_capabilities provides stable, admin-defined attributes, the Pure Storage Cinder driver goes further by reporting live performance metrics directly to the scheduler with every stats update. These metrics are dynamic, reflecting the current state of the array.

The Pure driver exposes the following additional data points:

Workload metrics:

  • total_volumes – Number of volumes currently on the array
  • total_snapshots – Number of snapshots currently on the array
  • total_hosts – Number of hosts connected to the array
  • total_pgroups – Number of protection groups configured

Performance metrics:

  • writes_per_sec – Current write operations per second
  • reads_per_sec – Current read operations per second
  • input_per_sec – Current input bandwidth (bytes/sec)
  • output_per_sec – Current output bandwidth (bytes/sec)

Latency metrics:

  • usec_per_read_op – Average microseconds per read operation
  • usec_per_write_op – Average microseconds per write operation
  • queue_usec_per_read_op – Average queue time per read (microseconds)
  • queue_usec_per_write_op – Average queue time per write (microseconds)
  • queue_usec_per_mirrored_write_op – Average queue time for mirrored writes (microseconds)

Using Driver Metrics in Scheduling Decisions

These metrics can be referenced in volume type extra specs just like static capabilities. For example:

# Avoid overloaded arrays
openstack volume type set performance-tier \
  --property capabilities:total_volumes="< 1000" \
  --property capabilities:writes_per_sec="< 50000"

# Target arrays with low latency
openstack volume type set ultra-low-latency \
  --property capabilities:usec_per_read_op="< 500" \
  --property capabilities:usec_per_write_op="< 800"

# Balance load across less-utilized arrays
openstack volume type set balanced-load \
  --property capabilities:total_hosts="< 100" \
  --property capabilities:reads_per_sec="< 30000"

Dynamic vs. Static: When to Use Each

Use static extra_capabilities for:

  • Array model/generation (FlashArray//X vs //C)
  • Maximum supported volume sizes
  • Feature flags (replication support, NVMe support)
  • Data center location or failure domain

Use driver-reported metrics for:

  • Load balancing across backends
  • Avoiding saturated arrays
  • Meeting latency SLAs
  • Workload isolation (keeping high-IOPS workloads off busy arrays)

The combination gives you both placement policy (where volumes should go based on capability) and intelligent load distribution (where volumes can go based on current conditions).

A Word of Caution

Driver-reported performance metrics are point-in-time snapshots. They fluctuate. If you set overly aggressive constraints on metrics like writes_per_sec or usec_per_read_op, you may inadvertently create scheduling failures during brief load spikes.

Use these metrics for:

  • Broad guidance (avoiding consistently overloaded arrays)
  • Soft preferences via GoodnessWeigher scoring
  • Preventing pathological cases (arrays already hosting 2000+ volumes)

Avoid using them for:

  • Hard requirements that must be met every single time
  • Sub-second precision guarantees

Keep the Scheduler Boring (on Purpose)

Here’s where restraint pays off. A minimal scheduler configuration often works best:

[scheduler]
filter_scheduler_enabled_filters = CapacityFilter,CapabilitiesFilter
scheduler_default_filters = CapacityFilter,CapabilitiesFilter
scheduler_default_weighers = CapacityWeigher
capacity_weight_multiplier = 1.0

This setup does exactly three things:

  1. Filters out backends that can’t satisfy hard requirements
  2. Ensures the volume physically fits
  3. Gently prefers backends with more free capacity

No hidden heuristics. No surprising behavior.

Volume Types as Workload Personas

This is where intent lives. Instead of naming volume types after hardware (“flasharray-x-tier”), name them after behavior (“high-perf”, “large-capacity”).

Here are several concrete examples using OpenStack volume type extra specs:

# Create volume types
openstack volume type create high-perf
openstack volume type create gold-efficient
openstack volume type create large-capacity
openstack volume type create balanced

# High performance tier – pins to FlashArray//X
openstack volume type set high-perf \
  --property capabilities:array_model="FlashArray//X" \
  --property capabilities:avg_latency_ms="< 1" \
  --property capabilities:avg_throughput_mbps="> 3000"

# Gold efficient tier – favors strong data reduction
openstack volume type set gold-efficient \
  --property capabilities:dedupe_ratio="> 4" \
  --property capabilities:avg_latency_ms="< 0.6"

# Large capacity tier – naturally gravitates to FlashArray//C
openstack volume type set large-capacity \
  --property capabilities:array_model="FlashArray//C" \
  --property capabilities:max_volume_size_gb=">= 10000"

# Balanced tier – allows either backend when conditions are reasonable
openstack volume type set balanced \
  --property capabilities:avg_latency_ms="< 1.5" \
  --property capabilities:dedupe_ratio="> 3" \
  --property free_capacity_gb="> 1000"

Notice what these specs define: constraints, not destinations. When multiple backends qualify, weighers decide the winner.

When Filtering Isn’t Enough: Enter the Goodness Filter

Binary filtering is sometimes too blunt. You need to rank candidates, not just accept or reject them. That’s where GoodnessFilter and GoodnessWeigher come in.

[scheduler]
enabled_filters = GoodnessFilter,CapacityFilter,CapabilitiesFilter
weight_classes = GoodnessWeigher

[goodness_weigher]
weight_capacity = 0.4
weight_performance = 0.4
weight_tier = 0.2
tier_scores = {'platinum': 1.0, 'gold': 0.8, 'silver': 0.5}

Each backend receives a composite score derived from live metrics and static attributes. Highest score wins.

A Concrete Example

Imagine a volume request that specifies both functional and performance intent:

openstack volume create --size 500 \
  --property tier=platinum \
  --property min_iops=40000 \
  --property replication=True \
  mybigvol

Multiple backends may pass the filters. The GoodnessWeigher then evaluates them using a weighted formula based on capacity headroom, available IOPS, and tier. Tie-breaking is automatic and repeatable.

The Scheduler as a Quiet Matchmaker

At its best, the Cinder scheduler is invisible. You describe the physics of your storage systems. You define workload personas. The scheduler matches them, consistently and defensibly.

Below is a conceptual diagram illustrating how backend capabilities, filters, and weighers interact during scheduling:

Figure: Cinder Scheduler Decision Flow

No mysticism. No hard pinning. Just metadata, math, and a system doing exactly what it was designed to do.

If you let it.


Why This Backend Lost: Debugging Scheduler Decisions

When a backend looks eligible but doesn’t receive the volume, the scheduler is usually behaving correctly—it’s just being quiet about its reasoning unless you ask.

The most common failure modes map directly to scheduler stages.

1. CapabilitiesFilter Rejection

If a backend never appears in the candidate list, it failed a hard requirement.

Typical causes:

  • Missing or mismatched extra_capabilities
  • String vs numeric comparison errors in extra_specs
  • Properties requested by the volume type that the backend never advertised

Scheduler logs will show entries like:

CapabilitiesFilter: backend flasharray2 does not satisfy capabilities: tier=platinum

At this stage, the backend is eliminated permanently for that request.

2. CapacityFilter Rejection

If capabilities match but capacity doesn’t, the backend is filtered later:

CapacityFilter: backend flasharray1 has insufficient free capacity

This is purely dynamic. A backend that loses today may win tomorrow.

3. GoodnessWeigher Tie-Breaking

The most subtle case is when multiple backends pass all filters, but one still loses.

Here the backend was good enough—just not best.

Enable debug logging for the scheduler:

[DEFAULT]
debug = True

You’ll then see computed goodness scores:

GoodnessWeigher: backend flasharray1 score=0.64
GoodnessWeigher: backend flasharray4 score=0.72

Nothing is wrong. The scheduler ranked candidates and chose deterministically.

4. The Important Mental Model

A backend that “lost” was not rejected—it was outscored.

That distinction matters operationally:

  • Rejections imply misconfiguration
  • Losses imply healthy competition

If you find yourself surprised by outcomes, the fix is rarely code. It’s almost always:

  • Adjusting capability semantics
  • Tuning weights
  • Clarifying workload intent

Key Takeaways

Scheduler decisions are explainable when logs are enabled:

  • Filters answer “can this backend work?”
  • Weighers answer “which backend is best right now?”
  • Losing a scheduling decision is not a failure condition

That’s how Cinder stops being plumbing and becomes an auditable policy engine.


What’s Next

In Part 2 of this series, I’ll explore a powerful horizontal scaling pattern: configuring multiple identical arrays with the same volume_backend_name. This approach enables automatic load balancing across arrays while keeping volume types simple and users blissfully unaware of which physical array serves their volumes.

The scheduler becomes not just a matchmaker, but an intelligent load distributor—all without changing a single line of application code.

Leave a Reply

Your email address will not be published. Required fields are marked *