Ansible Inventory 2.0 design rules

This is my third post in the Ansible Inventory series. See the first and the second posts for some background information.

Preliminary note: in this post, I try to give an example of a typical deployment inventory, and what rules we might need the inventory to abide to. Whilst I tried to keep things as simple as possible, I still feel this gets too complex. I’m not sure if it’s just a complex matter, if that’s me over complicating things, or if Ansible just should not try to handle things in a more convoluted way.

Let’s have an example of how inventory could be handled for a LAMP setup. Let’s assume we have these 3 set’s of applications:

an Apache + PHP setup
a MySQL cluster setup
an Apache reverse Proxy

We also have 3 environments: development, testing and production.

We have 4 different PHP applications, A, B, C and D. We have 2 MySQL cluster instances, CL1 (for dev and testing) and CL2 (for production). We have a single reverse proxy setup that manages all environments.

The Apache PHP application gets installed on one of three nodes, 1 per environment: node1 (dev), node2 (test) and node3 (prod).

For each role in ansible (assume 1 role per application here), we have to define a set of variables (template) that gets applied to the nodes. If we focus on the apache-php apps for this example, the apache-php varset template get instantiated 4 time, 1 for each of A, B, C and D. Assume the url for where the application gets published is part of each varset.

Each application gets installed on each node, respectively in one of the three environments. Each Apache-PHP node will need a list of those 4 applications, so it can define the needed virtual host, and set each application in its subdirectory. Where each application was just a set of key values, to define the single php app, we now need to listify those 4 varsets into a list that can be iterated on the apache config level.

Also, each Apache-RP node will need a list of applications, even when those applications are not directly installed on the reverse proxy nodes. The domain part (say contoso.com) is a specific domain for your organisation. Each application gets published beneath a specific context subfolder (contoso.com/appA, ..). For each environment we have a dedicated supdomain. We finally get 12 frontends: {dev,test,prod}.constoso.com/{appA.appB,appC,appD}. This 12 values must become part of a list of 12 values, and be exported to the reverse proxy nodes, together with the endpoint of the respective backend. (1)

Similarly CL1 needs a list of the applications in dev and test, and CL2 needs a list of applications in prod. We need a way to say that a particular variable that applies to a group of nodes, needs to be exported to a group of other nodes.

So, the initial var sets we had at the app level, get’s merged at some point when applied to a node. In this example, merging means, make a list out of the different single applications. It also means overrule: the environment gets overruled by membership of a certain environment group (like for the subdomain part).

Something similar could happen for the php version. One app could need PHP 5, whilst another would need PHP7, which could bring in a constraint that gets the application deployed on separate nodes within the same environment.

Of course, this can get very complicated, very quickly. The key is to define some basic rules the inventory needs (merge dictionaries, listify varsets, overrule vars, export vars to other hosts) and try to keep things simple.

Allow me to summarize a bunch of rules I came up with.

inventory is a group tree that consists of a set of subtrees, that each instantiates some meaningfull organisational function; typical subtrees are
- organisation/customer
- application
- environment
- location
variable sets define how they get merged
a subtree basically starts where a var set is defined to some child group
all groups are equal, rules for groups are set by the variable sets assigned to them and how those should be inherited
those rules typically kick in when a group has multiple parents, when it’s a merge group
lookup plugins could be re-invented at this (merge) level to create new lists
an inventory tree typically has subtrees, and each subtree is the initial source for some variable sets (typically child group of an application subtree)
not clear yet: how to import and map an external inventory (dynamic script) into the local inventory scheme
a variable is part of a variable set, and is defined by a schema; variables can merge doing a hash merge, by listifying a var, or adding lists and defining a form of precedence (a weight, assigned to group sub tree’s, not by group depth any more)
- it is namespaced by the variable set (could be linked to a specific application, perhaps maps onto an Ansible role)
- it has a name
- a type (single value, string, int, .. or a list or a dictionary…)
- define a merge strategy (listify, merge list, add list, dictionary merge, deep merge, …)
- when applied to a group (subtree), it defines a weight, check that no trees have the same weight!
- it has a role: parameter (a plain inventory variable), or it is a runtime variable (feedback from playbook execution, or it is a fact (the latter two could perhaps be the same)
- track its source (applied to a group, some external inventory, …)
- define a group_by rule, grouping/listifying it for serveral hosts (like the puppet external resources)
- track which node is a master node
merge groups could also be “cluster groups” = the groups that hold and instantiate nodes that are part of a common application pool
whilst nodes can host different application and hence be part of multiple cluster/merge groups, they can also be part of multiple other trees (think like separate nodes of a cluster that are part of different racks, or datacenters?)
merging variables can happen everywhere a node or group is member of different parents that hold the same variable set; hence at group level or at node level
nodes are children of merge groups and other subtree’s groups
nodes can be members of multiple cluster/merge groups
which node in a cluster group is the master node is related to a var set
being the master can be initially a plain parameter, but is overruled by its runtime value (think of master fail over)
when applying var sets to groups, define a weight; when merging vars within the same subtree, look at a merge strategy; hash merges might need a weight too?
variable sets are defined in a group in some subtree, and can be overriden in groups from other trees