Some first design ideas for an Ansible Inventory 2.0

[update] next post in this series: Ansible Inventory 2.0 design rules

In a my previous post “Current state of the Ansible inventory and how it might evolve” I explained some parts of the Ansible Inventory internals, and pointed out some features I would like to improve.

Whilst this exercise might be interesting to Ansible and specifically its internal inventory, it might also just be an idea for an external application that yields a flattened inventory (though an inventory plugin/ dynamic script), or it might be interesting to see if other configuration management tools might make use of it, as some sort of central or “single source of truth”.

Whereas currently the inventory has simple groups, that hold child groups, has parent groups, and can contain hosts, I believe a more rigid structure with more meta information would be beneficial. Not only to manage group trees, but also to manage variables assigned and defined in those groups, and managing the values throughout the parent child inheritance.

Next up some design ideas I have been playing with. A big part of this, is that, to me, managing inventory is much more about managing variable sets and their values, not just grouping hosts.

  1. inventory starts top level with a special root group. A bit like the all group we currently have. The root group is the only one that has 0 parents, and has one or more normal child groups. These child groups are the root groups for a subtree;
  2. a subtree holds sets of variables. ideally, a particular variable only lives in one single subtree;
  3. a normal group has 1 parent, and one ore more child groups;
  4. a merge group is a special group that can have more than one parent groups, but each parent must be member of a different subtree;
    • a merge group would typically merge sets of variables from different subtrees;
    • ideally a var does not exist in different parent trees, as to not have to deal with arbitrary precedence;
    • but maybe such a var holds e.g. a virtual host, and should at merge time become a list of virtual hosts, to be deployed on an apache instance;
    • care should be taken when a particular variable exists in different trees ;
  5. a merge group could also be cluster or instance group, or have such groups as a child, which means it has no child groups, but holds only hosts;
    • merge groups could also be dynamic: a child of postgres group and child of testing group would yield a postgres-test group
    • those groups need to track which subtrees they have in their ancestors
    • instead of tracking subtrees, perhaps track variable sets (and have a rule where a var can only exist in one set)
  6. a cluster group could keep track of which hosts in its group is a master (e.g. its’s a mysql master-slave cluster); such a property is of course dynamic; this would help to write playbooks that only have to run once per cluster, and on the master;
  7. a host can be member of different merge or cluster groups, e.g. when that hosts holds multiple roles. e.g. as a single LAMP stack, it runs mysql (with different databases) and apache (with different virtual hosts)
    • inheriting from multiple groups that are member of the same subtree, means something like having multiple instances of an applications, or virtual hosting applied on a host
    • this might be where the description for an application gets translated to what is needed to configure that application on one or more hosts
    • multiple app instances, can be bundled on a host, and more of them can be spread on multiple hosts
    • a single variable might needed to become a list on a specific instance
  8. merging groups is actually about merging the variables the hold
  9. a variable set is (meta) defined in a subtree; some vars might have a default, and some vars need to be updated when that default changes (perhaps a new dns server in your DC), whilst other may not be updated (the Java version your application was deployed with);
  10. at some point I tinkered on the idea of location groups/trees, which might be a thing more separate from classic organisational and application focused groups, to manage things like geographic location datacenter etc. but I’m not sure this still warrants a special kind of groups;
    • a geographical group membership could perhaps change the domain name of an url

But the point of all this is primarily to manage variables in the inventory. To be able to parametrize an application, to describe that application in perhaps a more high level way. Inventory should then allow you to transpose those values in a way that they easily apply to the host based execution level (the playbooks, and roles). This also includes a way to Puppet style “export” resources to other hosts.

Roles can be written and used in two ways, when deploying multiple instances of an application: (1) a role defines a basic application, and is called multiple times, perhaps as a parameterized role (but role: with_items: might be needed and that is not possible currently in Ansible); and (2) the role itself loops over a list of instances, where inventory translates membership of multiple apache virtualhosts instances to a list of virtual hosts per Ansible host.

The latter might be a more generic way of exporting resources. An example. Some subtree manages the setup of a single apache site. At some point multiple sites are defined. Sites will be grouped and installed on one of multiple apache setups. Here you happen to export virtual hosts into a list of virtualhosts for one apache. In a next step, *all* those virtualhosts get exported in a big list that configures your load balancer.

We need some generic way to create lists of things grouped  by a certain parameter.

Variables get inherited throughout the inventory trees. This could happen in a way where some precedence makes one value to overwrite another, or in a way where multiple values become a list of values. This might be part of some schema for variable sets in a specific tree? Another idea might be to not care about group types, and just apply rules groups via the variable sets they carry, track which sets a group inherits from, perhaps namespace them. Define how variable sets should merge, listify, or are not allowed to be combined.

How do we plugin external data into this model? Should the equivalent of current dynamic inventory scripts be mapped on a subtree? Or span multiple locations? Be mapped on a specific variable set? Hard to say in e general rule. Lots of those inventoruy scripts focus on host and groups, and perhaps some facts. Whilst this model has a bigger focus on managing variables.

Putting some more logic in the inventory could also mean that part of the manipulation that lookup plugins perform could happen in inventory. This would greatly simplify how we write loops in roles, by being able to do everyhing with a simple standard with_items.

As Dag Wieëers summarised his view on inventory to me, a new inventory should allow us to

  1. combine data from different sources, into a single source of truth
  2. do dynamic facts manipulation
  3. have a deterministic but configurable hierarchy

Another model that users tend to use in different ways, is where the host creation happens. Some start to define it in ansible inventory, then create the host with e.g. a vmware role, other import the host list from an external inventory, e.g. ec2. The way we import inventory data from external hosts should be well defined, how we map external groups and hosts and variables into this inventory model. Of course a new inventory should have a more elaborate API, not only internally, but also shown at the json API for dynamic inventory scripts.

Now, all of this sounds probably overly complex, and overdoing this new design is a serious risk. But I do hope to come to a model with just some basic simple rules that allows to implement all these ideas. If you have ideas on this, feel free to comment here of get in touch with me to further discuss this!

 

1 Comment

  • Dan says:

    I salute the initiative. One of the challanges I’ve had with the current state of the inventory and variables precedence order, was to explain to people how to easily go around having 4 environments that have the same application deployed in them and that application having a clustered resource (i.e. 4x PHP nodes).
    Gotachas like parsing group_vars in alphabetical order when one wants to target the web nodes of application X in environment Y… Sometimes the static grouping of dynamic inventories is simply not cutting it anymore. Will keep an eye on the progress. Thanks for sharing your thoughts so far on this!

1 Trackback