Manage Drift with Ansible

A long time ago on a blog site far, far way I wrote about using Ansible to monitor and manage your storage infrastructure using Ansible.

In the intervening years the modules I referenced back then have got better, more idempotent and have expanded their functionality. Also, there are new and cool apps that can be linked with Ansible.

So let’s rehash that original blog post…

Specifically I’m going to be referring to the Ansible Collections for Pure Storage FlashArray and FlashBlade platforms and how you can use these to keep infrastructure drift on those devices to a minimum.

The FlashArray and FlashBlade modules are included in all Ansible versions from 2.9, all the way up to the lastest (6.3 at the time of writing) and are provided as Collections.

Note: Pure publishes new Collections at a different cadence to Ansible releases, so always check you have the latest versions…

For the rest of this blog I will be referencing the following directory structure which will contain the role, herein after called drift.

├── drift
    ├── tasks
    │   ├── fa.yaml
    │   ├── fb.yaml
    │   └── main.yaml
    └── vars
        └── main.yaml

With any drift control mechanism you need to have a baseline and in the Ansible scenario this will be provided in a variables file drift/vars/main.yaml within the Ansible role.

The baseline array configuration system details in the variables file contain the following items:

  • Connectivity details for storage devices
  • Baseline vales for items including NTP, DNS, Active Directory, alerting, etc.

Here is an example of what the data may look like:

  - url:
    api: 89a9356f-c203-d263-8a89-c629486a13ba
    dc: London
  - url:
    api: 41238831-2b9d-89e2-b5f2-936e0a03ffb6
    dc: NewYork
  - url:
    api: T-68618f31-0c9e-4e57-ba44-5306a2cf10e3
    dc: London

You can see the first part contains the connectivity details for each storage platform, plus, in this case an added bonus of defining the datacenter the platform is located in. You could expand this with other variables, such as type: prod or type: dev, but you will need to carefully add these into the final playbooks to make them applicable.

A more comprehensive example file can be found here.

It is highly likely that in a large organization there will be multiple DCs with different configuration parameters. The variables file can deal with that by having a parameter section for each DC, such as London_ntp_servers and NewYork_ntp_servers.

For each of the specified FlashArrays or FlashBlades these baseline variables will be compared to the real-time configuration of individual storage platforms.

Should any of these comparisons fail, you have options as to either fix immediately, or to alert the discrepancy.

Now we create the main playbook that will be run when the role is called. This will be drift/tasks/main.yaml and will contain the following:

- name: Check latest SDKs installed
     - purity_fb
     - purestorage
     - py-pure-client
    state: latest

- name: Drift Control | FlashArray
    file: "fa.yaml"
    url: "{{ item.url }}"
    api: "{{ item.api }}"
    dc: "{{ item.dc }}"
  with_items: "{{ arrays }}"
  no_log: True

- name: Drift Control | FlashBlade
    file: "fb.yaml"
    url: "{{ item.url }}"
    api: "{{ item.api }}"
  with_items: "{{ blades }}"
  no_log: True

The first task is related to the fact we are using Pure Storage Ansible Collections. It is good practise to ensure that the latest Pure Storage Python SDKs are installed so that the most can be made of the Collection modules being invoked.

Next we call the FlashArray tasks by including the file fa.yaml, located in the drift/tasks directory. Then we do the same for FlashBlade tasks. The with_items parameter is what causes the playbook to iterate through each of the defined array or blade sections in the config file.

The meat of this drift control occurs in the two included task files. These perform the hard work. These are links to download examples of both fa.yaml and fb.yaml.

Let’s look at sections of one of these files to see what is happening.

The first task in the file is to get the real-time configuration information from the storage platform:

  - name: Drift Control | FlashArray | Get facts for {{ url }}
      fa_url: "{{ url }}"
      api_token: "{{ api }}"
      - config
      - minimum
    register: array_info

  - name: Drift Control | FlashArray | Set facts
      array_name: "{{ array_info.purefa_info.default.array_name }}"
      full_array_name: "{{ array_info.purefa_info.default.array_name + '[' + dc + ']' }}"
      array_version: "{{ array_info.purefa_info.default.purity_version }}"

The second task just sets a few variables that will be used later in the playbook.

Then comes the actual tasks to check each configuration parameter to be monitored for drift.

  - name: Drift Control | FlashArray | DNS for {{ full_array_name }}
      domain: "{{ vars[dc + '_dns_domain'] }}"
      nameservers: "{{ vars[dc + '_dns_address'] }}"
      fa_url: "{{ url }}"
      api_token: "{{ api }}"

  - name: Drift Control | FlashArray | NTP for {{ full_array_name }}
      ntp_servers: "{{ vars[dc + '_ntp_servers'] }}"
      fa_url: "{{ url }}"
      api_token: "{{ api }}"

These two examples will check the DNS and NTP configuration of a FlashArray and then set them back to the baseline if they are not as expected.

If you don’t want to fix the drift and instead alert the problem you can easily do this by running the tasks in check_mode (which doesn’t actually perform any changes) and adding in a call to send an email or even send a message to Slack.

Here I’m changing the NTP task to use check_mode and register a new fact. The fact status will then tell the alerting task to run or not. I’m using a task to send a message to Slack…

  - name: Drift Control | FlashArray | NTP for {{ full_array_name }}
      ntp_servers: "{{ dc + '_ntp_servers' }}"
      fa_url: "{{ url }}"
      api_token: "{{ api }}"
    check_mode: yes
    register: ntp_change

  - name: Drift Control | FlashArray | NTP for {{ array_name }}[{{ dc }}]
      module: slack
      token: "{{ slack_token }}"
      msg: "Array {{ full_array_name }} has incorrect NTP configuration"
    when: ntp_change.changed

Versions of the task files with alerting built in are also available here.

In these examples I am alerting via Slack. To find out how to achieve this, checkout this blog post.

Obviously there are many different variations of how this method can be implemented, different configuration parameters to use and also many different alerting mechanisms that can be implemented. It may also be possible to implement this for other infrastructure systems that need drift control monitoring, but you would need to check with the infrastructure vendor if they have Ansible Collections or modules that can perform these checks.

I hope this blog gives you some ideas on managing and alerting for infrastructure drift. I’d love to hear if anyone runs with this and applies it in the real world.

Leave a Reply

Your email address will not be published. Required fields are marked *