DevOps | 5 min read
Gitfs Pillar and other SaltConf19 Updates
Last autumn ngworx.ag got the chance to attend SaltConf19 in Saltlake City, UT. Here you will learn more about the latest news and stories about this trip.
SaltConf19 was much smaller than previous SaltConfs and there were some rumours that it wasn’t even sure it would take place, or that things were organized on short notice. Anyhow: It took place at its traditional location in the Salt Palace Convention Center and I got the chance to talk to many developers, trainers and users. One big news is: While Utah definitively has a lot to offer, it still involves a 10h flight from Switzerland. In 2020, there will be a Saltconf in Europe and Sue.nl from the Netherlands will host it! From Zurich, it takes 8h30m by train or 1h40 for people not affected by flight shame.
When it comes to software, I’m a very impatient guy. I hate laggy tools, slow build processes, slow CI pipelines, slow deployment processes, latency for 2FA, blocking garbage collectors and slow automation. While Salt is generally pretty responsive, its gitfs pillar fits my definition of a laggy and fragile tool and is a hassle to debug.
Gitfs pillars are a pretty powerful way to store configuration data for salt; it supports multiple branches (e.g. dev, test and prod) and maps them to salt environments; authentication can be password or key-based.
Using gitfs as a pillar
Since about 2013, using gitfs_pillar, Salt supports loading pillar data directly from git repositories. This simplifies sharing data and allows us to apply common development workflows with pillar data (branches, tests, merging).
We thought this was a very elegant approach and use it since we first touched Salt in 2016.
But there are some issues:
- It’s slow
- It takes a lot of time
- One has to wait (oh I mentioned that before)
Gitfs pillars are only being synchronized via polling of the repository during the maintenance process’ runs, so by default, there is a random latency of up to 60s.
Furthermore, we often ran into trouble if we were using Jinja templating and Jinja rendering failed: This can result in a broken salt cache and that salt would not directly try to load and render an updated git commit.
- Soft approach: salt \* saltutil.cache_clear
- Hard approach: sv stop salt-master && rm -rf /var/cache/salt/master && sv start salt-master\
- Extreme approach: We’ve experienced situations where we had to clear the cache on minions. So: salt \* cmd.run_bg ‘ systemctl stop salt-minion && rm -rf /var/cache/salt/minion && systemctl start salt-minion’ and continue with the hard approach.
Unfortunately, the extreme approach was pretty common in the past.
- Try to avoid Jinja; Create sls files using e.g. python. Python is much more powerful, readable and easier to test. Remember: Programs should be written for people to read, and only incidentally for machines to execute. (Harold Abelson)
- Run salt-master with log-level to debug. It then shows the rendering times, but it’s pretty noisy, i.e. not recommended in production
- Reduce the number of git branches that have to be cached (i.e. drop them on the repo, or specify it in gitfs configuration)
- Rebase if the git history becomes too long
And finally: Rethink the architecture. Do you really need dynamic mapping of git branches to Salt environments? Think about using regular git repositories on the salt-master’s file system instead of using gitfs. This allows for manual sync, reduces latency, speeds up synchronization and makes debugging easier.
This year, training courses were more Saltstack Enterprise oriented than previously, so it was tough to select trainings when only interested in using salt in a non-Windows and non-Enterprise environment. Unfortunately, some trainings didn’t work as planned (lab environment not working, not accessible or different than documentation), but this allowed more time for throwing questions to the pretty competent staff 🙂
There was an option to attend the Saltstack certified engineer exam, but I already have that in my pocket (#484, ID DE17699F, not sure if the certification status toggles if one tries again…).
Yes, finally Python 2.x is EOL since beginning 2020, so we had to upgrade all Salt installations to Python 3.x. Unfortunately, this collided with some repository issues at Saltstack (Python 3 yum repository was empty), but finally: we’re up to date; even on RHEL 7 Systems. In addition, some minor optimizations on our in-house code base were necessary as well.
Oh, and by the way: Is anybody else using the salt-bus to transport InfluxDB-Metrics?
I’ve committed multiple small patches to salt modules in the past. PRs were usually accepted quite quickly. Until recently, when I tried to fix an issue with triggering the influxdb salt module from the scheduler (kwargs has salt-internal keys that influxdb doesn’t understand/like, so it gives up).
The reason was: Missing tests. Well, yes, I didn’t write tests for the module for this 4-line change (the module currently has no tests for its 700 LOC). In theory, I agree with the decision to improve test coverage by more strictly moderating PRs, but this is how it ended: I’m now maintaining my own _modules directory.
At SaltConf19, it was announced that they’re migrating the test toolkit to pytest but back then this was still an ongoing process, so it probably takes some more time to stabilize and until people will start writing tests for the new environment.
Maybe the name will change for marketing reasons, but in late November 2019, the newest addition to the Saltstack environment was called ‘Heist’ (for non-native speakers: A heist is an armed robbery). It is a monolithic salt-minion that loads itself via an SSH-Tunnel to a node, does its work and disappears. Seemingly similar to what Ansible does and more powerful than what salt-ssh does. As there was no working demo, I can’t yet say much about it. It was said that there would be some packaging tool that lets one select features that are then bundled and sent to clients.