Yeah, I think that's the right answer. Dissolve it in water and run it through a smallish filter. Other impurities in the salt can clog the filter sometimes.
Theirs is certainly an impressive environment and I don’t mean to do Cloudflare’s achievements a disservice, but I strongly encourage engineers building these kinds of systems to treat their infrastructure as actual code, and avoid the temptation to dip in and out of wire text formats like JSON or YAML as much as possible.
The worst case scenario, in terms of engineering, is one piece of Python using Jinja templated YAML only for another piece of Python also written by you! to parse that output. Every time this happens it proves to be — as the article points out — a seized opportunity to get caught out by syntax errors, and a missed opportunity to have static analysis find errors (mypy et al., basically) before they happen at runtime, should all the logic had been done in pure Python without dipping in and out of structured text.
In the Cloudflare system the fundamental unit of action is configuration driving Python functions through gitops. My preferred version of these systems is pure python at the top emitting execve() calls, sh-scripts, and file writing over ssh or local transports, or in Dockerfiles, possibly with very small sh functions on the far side, but kept minimal in size and scope and with everything being purely declarative.
(It’s certainly an anti-pattern to return data back from the host to decide what to do next. The Python end is only allowed to declare that a package be installed, and the rest of the system ensures that is the case. People think this is limiting but the majority of these configuration systems, in my experience, hinge on 90% data structures to declare how the system out to be — IPAM arithmetic, building config files from lists of domains and accounts, processing key material etc. — and only 10% is the logic to install things much of which is very simple given a good base OS like Debian where many packages split their config into .d directories with helper scripts to enable things.)
PS: I wonder if the authors have had experience with Ansible? It was my own experience with that tool’s slowness and inflexibility that prompted a lot of my opinion forming in this area. Lots of good ideas have been borne of having first been exposed to Ansible and, alas, coming up against its limits.
Ansible is only slow when run in a remote-push based fashion. As a local config management solution, it can be quite fast. Ultimately, any push-based CM solution will be slow and failure-prone in the end.
I think it's fair to consider remote push-based as the "default" Ansible setup against which one measures. In my experience, the #1 talking point people use to praise Ansible is that you don't need to install anything locally, just remotely push configs over ssh. Therefore, it seems fair to consider that the typical Ansible setup. Maybe the community has pivoted, but in the past at least that was my experience.
Having worked with Salt and Ansible and Puppet extensively, there really is no good argument to be made for the sort of push architecture the article here is struggling with. At one large SaaS company I worked for, we replaced a mix of push-based Ansible, Salt, and Puppet with a fully pull-based Ansible system that solved most of the problems of these centrally-controlled push-based systems. It was lightning-fast and far easier to manage at a growing scale.
The fact that Cloudflare sysadmins were desperately chasing Salt logs between minions and masters in recent memory is a shocking failure of imagination (or investment) on their part.
Do you have any good references/example/docs/keywords about the difference between setting up and running "a fully pull-based Ansible system" compared to "centrally-controlled push-based systems"? I'm fairly certain I'm doing what you'd call "centrally-controlled push-based Ansible", but I'm in the planning stages of formalising and operationalising our ongoing configuration management policies, SOPs, internal docs, and dev training - I'd love to know just how I'm "doing it wrong"...
(Note: we are not even in the same universe as Cloudflare, fleet size wise. Think perhaps a few dozen hosts, not thousands or tens of thousands. We've only just barely embraced the "cattle, not pets" stage here.)
I never had ansible scale through more than 100 servers. Its design assumes things will mostly work. Above a few hundred servers, things will fail all day every day.
Whereas I have seen salt easily manage 6000+ servers.
Dissolve the whole heap in water? Or should I read the article to learn this isn’t a physics question ;-) ?
Yeah, I think that's the right answer. Dissolve it in water and run it through a smallish filter. Other impurities in the salt can clog the filter sometimes.
So close, it was in fact a philosophy question ..
https://plato.stanford.edu/entries/sorites-paradox/
"How many grains of sand change a heap of salt into a pile of manure"
...none? manure requires organic material
Yes, none is correctly wrong.
Theirs is certainly an impressive environment and I don’t mean to do Cloudflare’s achievements a disservice, but I strongly encourage engineers building these kinds of systems to treat their infrastructure as actual code, and avoid the temptation to dip in and out of wire text formats like JSON or YAML as much as possible.
The worst case scenario, in terms of engineering, is one piece of Python using Jinja templated YAML only for another piece of Python also written by you! to parse that output. Every time this happens it proves to be — as the article points out — a seized opportunity to get caught out by syntax errors, and a missed opportunity to have static analysis find errors (mypy et al., basically) before they happen at runtime, should all the logic had been done in pure Python without dipping in and out of structured text.
In the Cloudflare system the fundamental unit of action is configuration driving Python functions through gitops. My preferred version of these systems is pure python at the top emitting execve() calls, sh-scripts, and file writing over ssh or local transports, or in Dockerfiles, possibly with very small sh functions on the far side, but kept minimal in size and scope and with everything being purely declarative.
(It’s certainly an anti-pattern to return data back from the host to decide what to do next. The Python end is only allowed to declare that a package be installed, and the rest of the system ensures that is the case. People think this is limiting but the majority of these configuration systems, in my experience, hinge on 90% data structures to declare how the system out to be — IPAM arithmetic, building config files from lists of domains and accounts, processing key material etc. — and only 10% is the logic to install things much of which is very simple given a good base OS like Debian where many packages split their config into .d directories with helper scripts to enable things.)
PS: I wonder if the authors have had experience with Ansible? It was my own experience with that tool’s slowness and inflexibility that prompted a lot of my opinion forming in this area. Lots of good ideas have been borne of having first been exposed to Ansible and, alas, coming up against its limits.
IME you end up in roughly the same place regardless of which direction you go.
So, Pulumi?
Ansible is only slow when run in a remote-push based fashion. As a local config management solution, it can be quite fast. Ultimately, any push-based CM solution will be slow and failure-prone in the end.
I think it's fair to consider remote push-based as the "default" Ansible setup against which one measures. In my experience, the #1 talking point people use to praise Ansible is that you don't need to install anything locally, just remotely push configs over ssh. Therefore, it seems fair to consider that the typical Ansible setup. Maybe the community has pivoted, but in the past at least that was my experience.
Having worked with Salt and Ansible and Puppet extensively, there really is no good argument to be made for the sort of push architecture the article here is struggling with. At one large SaaS company I worked for, we replaced a mix of push-based Ansible, Salt, and Puppet with a fully pull-based Ansible system that solved most of the problems of these centrally-controlled push-based systems. It was lightning-fast and far easier to manage at a growing scale.
The fact that Cloudflare sysadmins were desperately chasing Salt logs between minions and masters in recent memory is a shocking failure of imagination (or investment) on their part.
Do you have any good references/example/docs/keywords about the difference between setting up and running "a fully pull-based Ansible system" compared to "centrally-controlled push-based systems"? I'm fairly certain I'm doing what you'd call "centrally-controlled push-based Ansible", but I'm in the planning stages of formalising and operationalising our ongoing configuration management policies, SOPs, internal docs, and dev training - I'd love to know just how I'm "doing it wrong"...
(Note: we are not even in the same universe as Cloudflare, fleet size wise. Think perhaps a few dozen hosts, not thousands or tens of thousands. We've only just barely embraced the "cattle, not pets" stage here.)
I never had ansible scale through more than 100 servers. Its design assumes things will mostly work. Above a few hundred servers, things will fail all day every day. Whereas I have seen salt easily manage 6000+ servers.