Ansible Inventory 2.0 design rules

This is my third post in the Ansible Inventory series. See the first and the second posts for some background information.

Preliminary note: in this post, I try to give an example of a typical deployment inventory, and what rules we might need the inventory to abide to. Whilst I tried to keep things as simple as possible, I still feel this gets too complex. I’m not sure if it’s just a complex matter, if that’s me over complicating things, or if Ansible just should not try to handle things in a more convoluted way.

Let’s have an example of how inventory could be handled for a LAMP setup. Let’s assume we have these 3 set’s of applications:

  • an Apache + PHP setup
  • a MySQL cluster setup
  • an Apache reverse Proxy

We also have 3 environments: development, testing and production.

We have 4 different PHP applications, A, B, C and D. We have 2 MySQL cluster instances, CL1 (for dev and testing) and CL2 (for production). We have a single reverse proxy setup that manages all environments.

The Apache PHP application gets installed on one of three nodes, 1 per environment: node1 (dev), node2 (test) and node3 (prod).

For each role in ansible (assume 1 role per application here), we have to define a set of variables (template) that gets applied to the nodes. If we focus on the apache-php apps for this example, the apache-php varset template get instantiated 4 time, 1 for each of A, B, C and D. Assume the url for where the application gets published is part of each varset.

Each application gets installed on each node, respectively in one of the three environments. Each Apache-PHP node will need a list of those 4 applications, so it can define the needed virtual host, and set each application in its subdirectory. Where each application was just a set of key values, to define the single php app, we now need to listify those 4  varsets into a list that can be iterated on the apache config level.

Also, each Apache-RP node will need a list of applications, even when those applications are not directly installed on the reverse proxy nodes. The domain part (say contoso.com) is a specific domain for your organisation. Each application gets published beneath a specific context subfolder (contoso.com/appA, ..). For each environment we have a dedicated supdomain. We finally get 12 frontends: {dev,test,prod}.constoso.com/{appA.appB,appC,appD}. This 12 values must become part of a list of 12 values, and be exported to the reverse proxy nodes, together with the endpoint of the respective backend. (1)

Similarly CL1 needs a list of the applications in dev and test, and CL2 needs a list of applications in prod. We need a way to say that a particular variable that applies to a group of nodes, needs to be exported to a group of other nodes.

So, the initial var sets we had at the app level, get’s merged at some point when applied to a node. In this example, merging means, make a list out of the different single applications. It also means overrule: the environment gets overruled by membership of a certain environment group (like for the subdomain part).

Something similar could happen for the php version. One app could need PHP 5, whilst another would need PHP7, which could bring in a constraint that gets the application deployed on separate nodes within the same environment.

Of course, this can get very complicated, very quickly. The key is to define some basic rules the inventory needs (merge dictionaries, listify varsets, overrule vars, export vars to other hosts) and try to keep things simple.

 

Allow me to summarize a bunch of rules I came up with.

  • inventory is a group tree that consists of a set of subtrees, that each instantiates some meaningfull organisational function; typical subtrees are
    • organisation/customer
    • application
    • environment
    • location
  • variable sets define how they get merged
  • a subtree basically starts where a var set is defined to some child group 
  • all groups are equal, rules for groups are set by the variable sets assigned to them and how those should be inherited
  • those rules typically kick in when a group has multiple parents, when it’s a merge group
  • lookup plugins could be re-invented at this (merge) level to create new lists
  • an inventory tree typically has subtrees, and each subtree is the initial source for some variable sets (typically child group of an application subtree)
  • not clear yet: how to import and map an external inventory (dynamic script) into the local inventory scheme 
  • a variable is part of a variable set, and is defined by a schema; variables can merge doing a hash merge, by listifying a var, or adding lists and defining a form of precedence (a weight, assigned to group sub tree’s, not by group depth any more)
    • it is namespaced by the variable set (could be linked to a specific application, perhaps maps onto an Ansible role)
    • it has a name
    • a type (single value, string, int, .. or a list or a dictionary…)
    • define a merge strategy (listify, merge list, add list, dictionary merge, deep merge, …)
    • when applied to a group (subtree), it defines a weight, check that no trees have the same weight!
    • it has a role: parameter (a plain inventory variable), or it is a runtime variable (feedback from playbook execution, or it is a fact (the latter two could perhaps be the same)
    • track its source (applied to a group, some external inventory, …)
    • define a group_by rule, grouping/listifying it for serveral hosts (like the puppet external resources)
    • track which node is a master node
  • merge groups could also be “cluster groups” = the groups that hold and instantiate nodes that are part of a common application pool
  • whilst nodes can host different application and hence be part of multiple cluster/merge groups, they can also be part of multiple other trees (think like separate nodes of a cluster that are part of different racks, or datacenters?)
  • merging variables can happen everywhere a node or group is member of different parents that hold the same variable set; hence at group level or at node level
  • nodes are children of merge groups and other subtree’s groups 
  • nodes can be members of multiple cluster/merge groups
  • which node in a cluster group is the master node is related to a var set
  • being the master can be initially a plain parameter, but is overruled by its runtime value (think of master fail over)
  • when applying var sets to groups, define a weight; when merging vars within the same subtree, look at a merge strategy; hash merges might need a weight too?
  • variable sets are defined in a group in some subtree, and can be overriden in groups from other trees

 

Overview in a nut shell:

(1) This is probably the point where service discovery becomes a better option

Some first design ideas for an Ansible Inventory 2.0

[update] next post in this series: Ansible Inventory 2.0 design rules

In a my previous post “Current state of the Ansible inventory and how it might evolve” I explained some parts of the Ansible Inventory internals, and pointed out some features I would like to improve.

Whilst this exercise might be interesting to Ansible and specifically its internal inventory, it might also just be an idea for an external application that yields a flattened inventory (though an inventory plugin/ dynamic script), or it might be interesting to see if other configuration management tools might make use of it, as some sort of central or “single source of truth”.

Whereas currently the inventory has simple groups, that hold child groups, has parent groups, and can contain hosts, I believe a more rigid structure with more meta information would be beneficial. Not only to manage group trees, but also to manage variables assigned and defined in those groups, and managing the values throughout the parent child inheritance.

Next up some design ideas I have been playing with. A big part of this, is that, to me, managing inventory is much more about managing variable sets and their values, not just grouping hosts.

  1. inventory starts top level with a special root group. A bit like the all group we currently have. The root group is the only one that has 0 parents, and has one or more normal child groups. These child groups are the root groups for a subtree;
  2. a subtree holds sets of variables. ideally, a particular variable only lives in one single subtree;
  3. a normal group has 1 parent, and one ore more child groups;
  4. a merge group is a special group that can have more than one parent groups, but each parent must be member of a different subtree;
    • a merge group would typically merge sets of variables from different subtrees;
    • ideally a var does not exist in different parent trees, as to not have to deal with arbitrary precedence;
    • but maybe such a var holds e.g. a virtual host, and should at merge time become a list of virtual hosts, to be deployed on an apache instance;
    • care should be taken when a particular variable exists in different trees ;
  5. a merge group could also be cluster or instance group, or have such groups as a child, which means it has no child groups, but holds only hosts;
    • merge groups could also be dynamic: a child of postgres group and child of testing group would yield a postgres-test group
    • those groups need to track which subtrees they have in their ancestors
    • instead of tracking subtrees, perhaps track variable sets (and have a rule where a var can only exist in one set)
  6. a cluster group could keep track of which hosts in its group is a master (e.g. its’s a mysql master-slave cluster); such a property is of course dynamic; this would help to write playbooks that only have to run once per cluster, and on the master;
  7. a host can be member of different merge or cluster groups, e.g. when that hosts holds multiple roles. e.g. as a single LAMP stack, it runs mysql (with different databases) and apache (with different virtual hosts)
    • inheriting from multiple groups that are member of the same subtree, means something like having multiple instances of an applications, or virtual hosting applied on a host
    • this might be where the description for an application gets translated to what is needed to configure that application on one or more hosts
    • multiple app instances, can be bundled on a host, and more of them can be spread on multiple hosts
    • a single variable might needed to become a list on a specific instance
  8. merging groups is actually about merging the variables the hold
  9. a variable set is (meta) defined in a subtree; some vars might have a default, and some vars need to be updated when that default changes (perhaps a new dns server in your DC), whilst other may not be updated (the Java version your application was deployed with);
  10. at some point I tinkered on the idea of location groups/trees, which might be a thing more separate from classic organisational and application focused groups, to manage things like geographic location datacenter etc. but I’m not sure this still warrants a special kind of groups;
    • a geographical group membership could perhaps change the domain name of an url

But the point of all this is primarily to manage variables in the inventory. To be able to parametrize an application, to describe that application in perhaps a more high level way. Inventory should then allow you to transpose those values in a way that they easily apply to the host based execution level (the playbooks, and roles). This also includes a way to Puppet style “export” resources to other hosts.

Roles can be written and used in two ways, when deploying multiple instances of an application: (1) a role defines a basic application, and is called multiple times, perhaps as a parameterized role (but role: with_items: might be needed and that is not possible currently in Ansible); and (2) the role itself loops over a list of instances, where inventory translates membership of multiple apache virtualhosts instances to a list of virtual hosts per Ansible host.

The latter might be a more generic way of exporting resources. An example. Some subtree manages the setup of a single apache site. At some point multiple sites are defined. Sites will be grouped and installed on one of multiple apache setups. Here you happen to export virtual hosts into a list of virtualhosts for one apache. In a next step, *all* those virtualhosts get exported in a big list that configures your load balancer.

We need some generic way to create lists of things grouped  by a certain parameter.

Variables get inherited throughout the inventory trees. This could happen in a way where some precedence makes one value to overwrite another, or in a way where multiple values become a list of values. This might be part of some schema for variable sets in a specific tree? Another idea might be to not care about group types, and just apply rules groups via the variable sets they carry, track which sets a group inherits from, perhaps namespace them. Define how variable sets should merge, listify, or are not allowed to be combined.

How do we plugin external data into this model? Should the equivalent of current dynamic inventory scripts be mapped on a subtree? Or span multiple locations? Be mapped on a specific variable set? Hard to say in e general rule. Lots of those inventoruy scripts focus on host and groups, and perhaps some facts. Whilst this model has a bigger focus on managing variables.

Putting some more logic in the inventory could also mean that part of the manipulation that lookup plugins perform could happen in inventory. This would greatly simplify how we write loops in roles, by being able to do everyhing with a simple standard with_items.

As Dag Wieëers summarised his view on inventory to me, a new inventory should allow us to

  1. combine data from different sources, into a single source of truth
  2. do dynamic facts manipulation
  3. have a deterministic but configurable hierarchy

Another model that users tend to use in different ways, is where the host creation happens. Some start to define it in ansible inventory, then create the host with e.g. a vmware role, other import the host list from an external inventory, e.g. ec2. The way we import inventory data from external hosts should be well defined, how we map external groups and hosts and variables into this inventory model. Of course a new inventory should have a more elaborate API, not only internally, but also shown at the json API for dynamic inventory scripts.

Now, all of this sounds probably overly complex, and overdoing this new design is a serious risk. But I do hope to come to a model with just some basic simple rules that allows to implement all these ideas. If you have ideas on this, feel free to comment here of get in touch with me to further discuss this!

 

Current state of the Ansible inventory and how it might evolve

[update] Follow up article: Some first design ideas for an Ansible Inventory 2.0

[update] Second follow-up post: Ansible Inventory 2.0 design rules

This is an introductory post about the Inventory in Ansible where I’m looking at the current design and implementation, some of it internals, and where hope to yield some discussion and ideas on how it could be improved, be extended. A recent discussion at Devopsdays Ghent last week, re-spawned my interest in this topic, with some people actively showing interest to participate. Part of that interest is about building some standard inventory tool with some API and frontend, similar to what Vincent Van der Kussen started (and also lots of other often now abandoned projects) but going way further. Of course, that exercise would be pointless when not looking at what parts need to happen upstream, and what parts are more fitting in a separate project, or not. That’s also why my initial call to people interested in this didn’t focus on immediately bringing that discussion on one of the Ansible mailing lists.

20161103115009

In it’s current state, the Ansible Inventory – at the time of writing 2.2 was just released – hasn’t changed since it’s initial inception on how it was modelled during it’s early 0.x releases. This post tries to explain a bit how it was designed, how it works, and what might be its limits. I might oversimplify some details whilst focusing on the internal model and how data is kept in data structures.

ansible-host-inventory1

Whilst most people tend to see the inventory as just the host list, and a way to group them, it is much more than that, as parameters to each host, inventory variables, are also part of the inventory. Initially, those group and host variables where implemented as vars plugins, and whilst the documentation still seems to imply this, this hasn’t been true since around a major bugfix and update to the inventory in the 1.7 release, now over two years ago, where this part now is a fixed part of the ansible.inventory code. As far as I know, nobody seems to use custom vars plugins. I’d argue that the part where one manages variables, parameters to playbooks and roles, is the most important part in inventory. Structurally, the inventory model comes down to this:

The inventory basically is a list of hosts, which can be member of one or more groups, and where each group can be a child of one or more other (parent) groups. One can define variables on each of those hosts and groups, and define different values for all of those hosts. Groups and hosts inherit variables from their parents.

The inventory is mostly pre-parsed at the beginning of an Ansible run. After that, you can consider the inventory as being a set of groups, where hosts live, and each host has a set of variables with one specific value attached to it. An often made misunderstanding that comes up on the mailing list every now and then, is thinking a host can have a different value for a specific variable, depending on which group was used to target that host in a playbook. Ansible doesn’t work like that. Ansible takes a hosts: definition and calculates a list of hosts. In the end which exact group was used to get to that host, doesn’t matter anymore. Before hosts are touched, and variables are used, those variables always are calculated down to the host. Assigning different values to different groups, is how you can manage those, but in the end, you could choose to never use group_vars, and put everything yourself, manually, in host_vars, and get the same end result.

Now, if a host is member of multiple groups, and the same variable is defined in (some of) those groups, the question is, which value will prevail? Variable precedence in Ansible is quite a beast, and can be quite complex or at least daunting to both new and experienced users. The Ansible docs overview doesn’t explain alle the nifty details, and I couldn’t even find an explanation how precedence works within group_vars. Now, the short story here is, the more specific and down the tree wins. A child group wins over a parent group, host vars always win over group vars.

When host kid is member of a father group, and that father group is member of a grandfather group, then the kid will inherit variables from grandfather and father. Father could overrule a value from grandfather, and kid can overrule his father and grandfather if he wants. Modern family values.

There are also two special default groups: all and ungrouped. The former contains all groups that are not defined as a child group of another group, the latter contains all hosts that are not made member of a group.

1

But what if I have two application groups, app1 and app2, which are not parent-child related, and both define the same variable? In this case, both groups app1 and app2 live on the same level, and have the same ‘depth‘. Which one will prevail depends on the alphanumerical sorting of both names – IIRC – but I’m not even sure of the details.

2

That depth parameter is actually an internal parameter of the Group object. Each time a group is made member of a parent group, that group gets the depth from its parent + 1 unless if that group’s depth was already bigger than that newly calculated depth. The special ALL group has a depth of 0, app1 and app2 both have a depth of 1, and app21 got a depth of two. For a variable defined in all those groups, the value in app21 will be inherited by node2, whilst node1 will get the value from either app1 or app2, which is more or less undefined. That’s one of the reasons why I recommend to not define the same variables in multiple group “trees”, where a group tree is one big category of groups. It’s already hard to manage groups within a specific sub tree, whilst keeping precedence (and hence depth) in mind, it’s totally impractical to track that amongst groups from different sub trees.

20161103084137
Oh, if you want to generate a nice graphic of your inventory, to see how it lays out, Will Thames manages a nice project (https://github.com/willthames/ansible-inventory-grapher) that does just that. As long as graphviz manages to generate a small enough graph that fits on a sheet of paper, whilst remaining readable, you probably have a small inventory.

Unless you write your own application to manage all this, and write a dynamic inventory script to export data to Ansible, one is stuck with mostly the INI style hosts files, and the yaml group_vars and host_vars files to manage those groups and variables. If you need to manage lots of hosts, lots of applications, you probably end up with lots of categories to organise these groups, and then it’s very easy to lose any overview in how those groups are structured, and how variables inherit over those groups. It becomes hard to predict which value will prevail for a particular host, which doesn’t help to ensure consistency. If you were to change default values in the all group, whilst some hosts have an overruled value defined in child groups, but not all, you suddenly change values for a bunch of hosts. That might be your intention, but when managing hundreds of different groups, you know such a mistake might easily happen when all you have are the basic Ansible inventory files.

Ansible tends (or at least often used to) to recommend not doing such complicated things, “better remodel how you structure your groups” – but I think managing even moderately large infrastructures can quickly have complex demands on how to structure your data model. Different levels of organisation (sub trees) are often needed in this concept. Many of us need to manually create intersection of groups such as app1 ~ development => app1-dev to be able to manage different parameters for different applications in different environments. At scale this quickly becomes cumbersome with the ini and yaml file based inventory, as that doesn’t scale. Maybe we need a good pattern to handle dynamic intersections implemented upstream? Yes, hosts: app1&dev, but that is parsed at run time, and you can’t assign vars to such an intersection. An interesting approach is how Saltstack – which doesn’t have the notion of groups in the way Ansible does – lets you target hosts using grains, a kind of dynamic groups filtering hosts based on facts. Perhaps this project does something similar?

Putting more logic on how we manage the inventory, could be beneficial in several ways. Some ideas.

Besides scaling the current model by having a better tooling, it could for example allow to write simpler roles/playbooks, e.g. where the only lookup plugin we need to use for with_ iterations, would be the standard with_items, as the other ones are actually used to handle more complex data and generate lists from it. That could be part of the inventory, where the real infrastructure-as-data is modelled, doing a better decoupling of code (roles) and config (inventory). Possibly a way to really describe infrastructure as a higher level data model, describing a multi-tier application, that gets translated into specific configurations and action on the node level.

How about tracking the source of (the value for) a variable, allowing to make a difference between having a variable inherit a default and updating from changing that default from a more generic group (e.g. the dns servers for the DC), as opposed to only instantiate a variable from a default, and not letting it change afterwards by inheriting that default (e.g. the Java version development starts out with during the very first development deploy).

Where are gathered facts kept? For some, just caching those as it currently happens, is more than enough. For others, especially more specific custom facts – I’d call those run time values – that can have a direct relationship with inventory parameters, it might make more sense to keep them in the inventory. Think about the version of your application: you configure that as a parameter, but how do you know if that version is also the one that is currently deployed? One might need to keep track of the  deployed version (runtime fact) and compare that to what was configured to be deployed (inventory parameter).

An often needed pattern when deploying clusters, is the concept of a specific cluster group. I’ve seen roles acting on the master node in a cluster by conditionally running when: inventory_hostname = myapplicationcluster[0].

How many of you are aware that the order of the node list of a group was reversed somewhere between two 1.x releases? It was initially reverse alphanumerically ordered.

This should nicely illustrate the problem of relying on undocumented and untested behaviour. This pattern also makes you hard coding an inventory group name, in a role, which I think is ugly, and makes your role less portable. Doing a run_once can solve a similar problem, but what if your play targets a group of multiple clusters, instead of a specific cluster where you need to run_once per cluster? Perhaps we need to introduce a special kind of group that can handle the concept of a cluster? Metadata to groups, hosts and their variables?

Another hard pattern to implement in Ansible, is what Puppet solves with their exported resources. Take a list of applications that are deployed on multiple (application) hosts, and put that in a list so we can deploy a load balancer on 1 specific node. Or monitoring, or ACL to access those different applications. As Ansible in the end manages just vars per hosts, doing things amongst multiple hosts is hard. Can’t solve everything with a delegate_to.
node-collaboration-exported-resources-and-puppetdb-7-638

At some point, we might want to parse the code (the roles) that will be run, as to import a list of variables that are needed, and build some front-end, so the user can instantiate his node definitions and parameterize the right vars in inventory.

How about integrating a well managed inventory, with external sources? Currently, that happens by defining the Ansible inventory as a directory, and combining ini files with dynamic inventory scripts. I won’t get started on the merits of dir.py, but let’s say we really need to redesign that into something more clever, that integrates multiple sources, keeps track of metadata, etc. William Leemans started something on this after some discussion at Loadays in 2014, implementing specific external connectors.

Perhaps on a side note, I have also been thinking a lot on versioning. Versioning of the deploy code, and especially roles here. I want to be able to track the life cycle of an application, its different versions, and the life cycle and versions of it’s deployment. Imagine I start a business where I offer hosted Gitlab for multiple customers. Each of those customers potentially has multiple organisations, and hence Gitlab instances. Each of them potentially want to always run the latest version, whilst some want to remain ages on the same enterprizy first install, and others are somewhere in between. Some might even have different DTAP environments – Gitlab might be a bad example here as not all customers will do custom Gitlab development, but you get the idea. In the end you really have LOTS of Gitlab instances to manage. Will you manage them all with one single role? At some point the role needs development. Needs testing. A new version of the role needs to be used in production, for a specific customer’s instance. How can I manage these different role version in the Ansible eco-system? Doing that in the inventory sounds like the central source of information?

Lots of ideas. Lots of things to discuss. I fully expect to hear about many other use cases I never thought of, or never even would need. And as to not re-invent the wheel, insights from other tools are very welcome here!

 

[update] Follow up article: Some first design ideas for an Ansible Inventory 2.0

New GPG Key

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA512

Date: 22 JUNE 2014

For a number of reasons[0], I've recently set up a new OpenPGP key,
and will be transitioning away from my old one.

The old key will continue to be valid for some time, but i prefer all
future correspondence to come to the new one. I would also like this
new key to be re-integrated into the web of trust. This message is
signed by both keys to certify the transition.

the old key was:

sec 1024D/0x8CC387DA097F5468 2004-07-14
Key fingerprint = 0FAC 6A6C D9D5 134C C87E 4FF3 8CC3 87DA 097F 5468

And the new key is:

sec 4096R/0xD08FC082B8E46E8E 2014-06-22 [expires: 2019-06-21]
Key fingerprint = F744 94B0 7042 6B14 BB90 D283 D08F C082 B8E4 6E8E

To fetch the full key from a public key server, you can simply do:

gpg --keyserver keys.riseup.net --recv-key

If you already know my old key, you can now verify that the new key is
signed by the old one:

gpg --check-sigs 0xD08FC082B8E46E8E

If you don't already know my old key, or you just want to be double
extra paranoid, you can check the fingerprint against the one above:

gpg --fingerprint 0xD08FC082B8E46E8E

If you are satisfied that you've got the right key, and the UIDs match
what you expect, I'd appreciate it if you would sign my key. You can
do that by issuing the following command:

**
NOTE: if you have previously signed my key but did a local-only
signature (lsign), you will not want to issue the following, instead
you will want to use --lsign-key, and not send the signatures to the
keyserver
**

gpg --sign-key 0xD08FC082B8E46E8E

I'd like to receive your signatures on my key. You can either send me
an e-mail with the new signatures (if you have a functional MTA on
your system):

gpg --export 0xD08FC082B8E46E8E | gpg --encrypt -r '$your_fingerprint' --armor | mail -s 'OpenPGP Signatures' serge@vanginderachter.be

Additionally, I highly recommend that you implement a mechanism to keep your key
material up-to-date so that you obtain the latest revocations, and other updates
in a timely manner. You can do regular key updates by using parcimonie to
refresh your keyring. Parcimonie is a daemon that slowly refreshes your keyring
from a keyserver over Tor. It uses a randomized sleep, and fresh tor circuits
for each key. The purpose is to make it hard for an attacker to correlate the
key updates with your keyring.

I also highly recommend checking out the excellent Riseup GPG best
practices doc, from which I stole most of the text for this transition
message ;-)

https://we.riseup.net/debian/openpgp-best-practices

Please let me know if you have any questions, or problems, and sorry
for the inconvenience.

If you have a keybase account and if you are into it, you can also check my
keybase page[1].

Serge van Ginderachter

0. https://www.debian-administration.org/users/dkg/weblog/48
1. https://keybase.io/svg

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iKYEARECAGYFAlOm06hfFIAAAAAALgAoaXNzdWVyLWZwckBub3RhdGlvbnMub3Bl
bnBncC5maWZ0aGhvcnNlbWFuLm5ldDBGQUM2QTZDRDlENTEzNENDODdFNEZGMzhD
QzM4N0RBMDk3RjU0NjgACgkQjMOH2gl/VGh5QgCdE2dKZly+MECXFfH0WCje9Rpo
/HoAoL+6jQ15wWq0FMrisRx24dX5OtOeiQJ8BAEBCgBmBQJTptOoXxSAAAAAAC4A
KGlzc3Vlci1mcHJAbm90YXRpb25zLm9wZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRG
NzQ0OTRCMDcwNDI2QjE0QkI5MEQyODNEMDhGQzA4MkI4RTQ2RThFAAoJENCPwIK4
5G6OnB0P/jw77jLrcLlP6GUXTrfui1Pbrk/W6hysplKHPzh53OoVJB0Bq6NARlOz
yptDwRW2LivFNz2M9FxObij+2CDGQ/FoOWdWlKatg9bvqhkQwglpMFNyGDQ/EOxV
a2ObFOySsGU2hXnYvVSUUCc1SMt5M3RKw3264ZsxqIda8o2lqF7ZO9qDijY7peHy
Ll0aPPxlFiqUjN0Q5P4PzoQcWbHDFLDO1Mm+P52gyod/Rh0PrWKOk2kwMEHFBwUd
tgi2jT+W4wv7yAOdvrIwiRpdqAM4be9MPDmXDjYrEHsJrKwqkXfDRRV53ZRFo8f3
bKXSnAV0i2svIEOscNWHhNrpmk5iqzyvr5CeJse7nEjXAP7HntTxPFIvWs2c3dvt
HItslcDcU2ZIrCh3rIi+fv7pcjX6JE/A0CzZkTo294wnGexoIRiRcC7wojS5e3PV
v83NZPRBz7tpPVQaMP74UiXvQpTm2GEiIXYtFkyZFEtyxwfEOY8L50QpMAJ0HXPm
7xH+XIaCcBljgeVoP0VlUecGW6aJubTryNTUimIBUnL7ItWjNLl7uJtDlGdjsOZV
QVgpQ6G3Tx8lDp+qo4SD4YI8zoWK59Ef9MUCSJn3ngWI0dG5jElONqOOY/W1zcyA
ce2wJs8ua79HJV/GXiadtlSCJpG8XfanyvhrvePSCp9O/5mZLnWs
=evlZ
-----END PGP SIGNATURE-----

Packt Publishing Ansible Configuration Management review

Around late November 2013 I – too – got contacted by Packt Publishing, asking to do a review on Ansible Configuration Management. I was a bit surprised, as I had declined their offer to write that book, which they asked me exactly two months earlier. Two months seemed like a short period of time to manage to write a book and get it published.

Either way, I kind of agreed, and got the book in pdf, printed it out, started some reading, lended it to a colleague (we us Ansible extensively at work), and just recently got it back so I could finish to have a look at it.

“Ansible Configuration Management” is an introductory book for beginners. I won’t introduce Ansible here, there are a lot of good resources on that, just duck it. Ansible being relatively new, has evolved quite a bit in the previous year, releasing 1.4 by the end of November. The current development cycle focuses more on bug fixes, and under the hood stuff, and less on new syntax, which was quite the opposite when going from 0.9 through 1.2, and up until the then and now current 1.3.

Knowing what major changes would get into 1.3 was easy when you followed the project. One of the major changes is the syntax for variables and templates. Basically, don’t use $myvar or ${othervar} any more, but only use {{ anicevar }}. If you know ansible, you know this is an important thing. I was very disappointed to notice the author didn’t stress this. Whilst most examples use the new syntax, at one point all syntax’s are presented as equally possible – which is correct for the then latest 1.3, but it was well known at the time it would be deprecated.

Of course, writing a tech book on a rapid evolving Open Source tool, will always be deprecated by the time it gets published. But I think this should be expected, and a good book on such a subject should of course focus on the most recent possible release, but also try to mention the newer features that are to be expected. Especially for a publisher that also focuses on Open Source.

A quirk, is when code snippets are discussed. Some of those longer snippets are printed across more than one page, and the book mentions certain line numbers. Which is confusing, and even unreadable, when the snippets don’t have line numbers. Later in the book, sometimes line numbers are used, but not in a very standard way:

 

code snippet with weird line number

code snippet with weird line number

Whilst most of this book has a clear layout at first sight, things like this don’t feel very professional.

This books gives a broad overview and discusses several basic things in ansible. It goes from basic syntax, over inventory, small playbooks, and extended playbooks ans also mentions custom code things. It gives lots of examples, discusses special variables, modules, plugins… and many more. Not all of them, but that is not needed, given the very good documentation the project publishes. This book is an introduction to ansible, so focusing on the big principles is more important at this point, than having a full inventory of all features. As it’s a relatively short book (around 75 pages), it’s small enough to be appealing as a quick introduction.

It’s a pity the publisher and the author didn’t pay more attention to details. The less critical user, with little to no previous ansible experience, will however get a good enough introduction with this book, with some more hand-holding and overview than what can easily be found freely on-line.

Git and Github: keeping a feature branch updated with upstream?

Git and github, you gotta love them for managing and contributing to (FLOSS) projects.

Contributing to a Github hosted project becomes very easy. Fork the project to your personal Github account, clone your fork locally, create a feature branch, make some patch, commit, push back to your personal Github account, and issue a pull request from your feature branch to the upstream (master) branch.


git clone -o svg git@github.com:sergevanginderachter/ansible.git
cd ansible
git remote add upstream git://github.com/ansible/ansible.git
git checkout -b user-non-unique
vi library/user
git add library user
git commit -m "Add nonunique option to user module, translating to the -o/--non-unique option to useradd and usermod."
git push --set-upstream svg user-non-unique
[go to github and issue the pull request]

Now, imagine upstream (1) doesn’t approve your commit and asks for a further tweak and (2) you need to pull in newer changes (upstream changes that were committed after you created your feature branch.)

How do we keep this feature branch up to date? Merging the newest upstream commits is easy, but you want to avoid creating a merge commit, as that won’t be appreciated when pushed to upstream: you are then effectively re-committing upstream changes, and those upstream commits will get a new hash (as they get a new parent). This is especially important, as those merged commits would be reflected in your Github pull request when you push those updates to your personal github feature branch (even if you do that after you issued the pull request.)

That’s why we need to rebase instead of merging:


git co devel #devel is ansible's HEAD aka "master" branch
git pull --rebase upstream devel
git co user-non-unique
git rebase devel

Both the rebase option and rebase command to git will keep your tree clean, and avoid having merge commits.
But keep in mind that those areyour first commits (with whom you issued your first pull request) that are being rebased, and which now have a new commit hash, which is different from the original hashes that are still in your remote github repo branch.

Now, pushing those updates out to your personal Github feature branch will fail here, as both branches differ: the local branch tree and the remote branch tree are “out of sync”, because of those different commit hashes. Git will tell you to first git pull --rebase, then push again, but this won’t be a simple fast-forward push, as your history got rewritten. Don’t do that!

The problem here is that you would again fetch your first changed commits as they were originally, and those will get merged on top of your local branch. Because of the out of sync state, this pull does not apply cleanly. You’ll get a b0rken history where your commits appear two times. When you would push all of this to your github feature branch, those changes will get reflected on the original pull request, which will get very, very ugly.

AFAIK, there is actually no totally clean solution to this. The best solution I found is to force push your local branch to your github branch (actually forcing a non-fast-orward update):

As per git-push(1):

Update the origin repository’s remote branch with local branch, allowing non-fast-forward updates. This can leave unreferenced commits dangling in the origin repository.

So don’t pull, just force push like this:

git push svg +user-non-unique

This will actually plainly overwrite your remote branch, with everything in your local branch. The commits which are in the remote stream (and caused the failure) will remain there, but will be dangling commit, which would eventually get deleted by git-gc(1). No big deal.

As I said, this is AFAICS the cleanest solution. The downside of this, is that your PR will be updated with those newest commits, which will get a later date, and could appear out of sync in the comment history of the PR. No big problem, but could potentially be confusing.

bash redirection target gets funky

Can anybody explain me how this funky behaviour in bash works?

find /root  >output 2>error 3

Yes, that’s just “error” followed by a space followed by “3”.

serge@goldorak:~/tmp$ ls -l
total 8
-rw-rw-r-- 1 serge serge 71 Aug 30 13:57 error 3
-rw-rw-r-- 1 serge serge 6 Aug 30 13:57 output
serge@goldorak:~/tmp$

Lets create a file with a space in it:

serge@goldorak:~/tmp$ touch "test 1"
serge@goldorak:~/tmp$ ls -l
total 8
-rw-rw-r-- 1 serge serge 71 Aug 30 13:57 error 3
-rw-rw-r-- 1 serge serge 6 Aug 30 13:57 output
-rw-rw-r-- 1 serge serge 0 Aug 30 13:58 test 1
serge@goldorak:~/tmp$

using bash completion I get:

serge@goldorak:~/tmp$ ls -l test 1
-rw-rw-r-- 1 serge serge 0 Aug 30 13:58 test 1
serge@goldorak:~/tmp$ ls -l error 3
-rw-rw-r-- 1 serge serge 71 Aug 30 13:57 error 3
serge@goldorak:~/tmp$

It seems the space in “error 3” is not a space but some other char?