I’m in the beginning phases of exploratories for an orchestration layer that I’ve been gauging support for at work for a few weeks.
I told everyone I wanted it. I realized it was a beaten horse already but there were some hurdles to jump through that had prevented it before. After a chat at the recommendation of my boss with someone on the architecture team it was time to start researching to build a proposal draft.
So, I started researching, getting a feel for what types of considerations should be made. Reaching out to the architects, explaining what I was about to research and that I’d not secured backing on it, nor should I try to until I’ve fully researched it enough to write a credible proposal — I do not suggest new things on whims and do not ever want to be seen as someone who does.
Later, it came up in the context of another issue today that one of the benefits of orchestration is the automation of orchestration, and, in a subsequent phase of evolution of the concept, event-driven orchestration (automatic remediation of known events or conditions that fix an expected problem, like restarting a service that tends to flop, etc). The speaker looks at me and I give a nod of approval and, when it came to event response, I interjected that to build that we’d need what I call an “Event Abstraction Layer
” on top of an Orchestration Layer
.
I would like to build one for myself as well once I have orchestration in place.
Since it isn’t something I’m building for work as I’m sure the talks I’m having will end with an out of box solution, I can go ahead and lay some framework down for the one I’m going to build for myself. I’ll call it CRD
.
CRD: One man’s EAL Implementation
To implement an EAL, I would start with a base component:
A modular daemon, which for now I’ll call CRD
(Condition/Response Daemon) that:
- Checks for a condition.
- Responds to that condition as a result of the check.
Operational Flow:
The daemon
would run on a periodic schedule configured in /etc/CRD/crd.conf to call: /opt/CRD/modules/{$Module}/check
for every ${Module} which returns a “0” or a “1” exit code.
A “0” exit code causes the daemon
to run /opt/CRD/modules/{$Module}/response
. These can be simple bash scripts with a proper shebang
line, or small (or even large) compiled C applications, to just about anything the system will know how to execute.
This second response component responds to the event. It can either be an action of remediation on the local system, or it could be a feeder
to a central service I’ll call an Event Collector
. Or it could be a REST API call to an orchestration layer component to properly integrate with an orchestration layer and bypass an EAL completely. It can be anything that you script or code it to do to respond to the condition that causes the check in that module to return a 0 exit code.
In my case I’d see machines running a CRD, have response scripts all pump into an MQ, and then on an event collector server that runs the MQ, have a CRD where the check scripts process the messages in the queue and the response scripts on that host would trigger actions in the orchestration layer. Super clean.
It’s a super simple daemon. It’s super flexible. And it’s super easy to build modules if you can do any scripting at all. It also separates the processes of building checks and responses meaning you delegate the work for each to different parts of the organization or team to accommodate a wide range of organizational processes.
So, let’s walk through module creation.
mkdir /opt/CRD/modules/TestModule touch /opt/CRD/modules/TestModule/check touch /opt/CRD/modules/TestModule/response
Make check executable.
Put a condition you want to respond to in check, like, “the size of /var/log/custom/
is larger than 20 MB” in bash, or sh, or perl. Add a proper shebang.
Have check return 0 if the condition you want to respond to is true.
Have check return 1 if the condition you want to respond to is false.
Have response do rm -rf /var/log/custom/*
A value in /etc/CRD/crd.conf indicates to the running daemon to run every 30 seconds. CRD runs every 30 seconds, going down the list of modules and executes TestModule’s check
executable. When /var/log/custom is larger than 20 MB, provided you don’t suck at scripting, the daemon kicks off the response
executable and wipes the contents of /var/log/custom.
Obviously I’d want to add a feature for some initialization, say, a script that runs when installing the module, but, this is an atomic
example. The simpler a design is, the more scalable it is.
If I were to build this, I think the best solution would be to also add a support module feature:
From here, for larger examples involving event collectors, I’d probably want CRD to have an -x switch that would execute support
modules.
Support modules would be obtained by calling CRD with an –install long switch, which would then go to a public repository
for CRD that users contribute support modules to for peer review prior to inclusion, or specified github
location with a -l switch .
A support
module would not have either of the check or response components and would consist primarily of an assembly or script that performed a complex task that would possess the same name as its module name.
Support modules should also have an install
script or assembly that prepared the system to use the module on installation, such as directory creation, or package dependency checks et al. If install
returns a non-0 exit code, CRD should roll back all changes by running rollback
, which is another assembly or script to perform that task:
Support Modules:
- install script
- rollback script
- assembly or script of the same name as module
- CRD gets -x, -l, –install for execution of installed support modules or check scripts in user modules or , remote retrieval of modules from a git repository, or remote retrieval from a public CRD repository.
From here, a user modules’ check and response components’, when they wanted to use the functionality provided by a support component, would be able to issue crd -x $moduleName
as a system call. I am certain that future versions would also have RPC support for more graceful implementations since this design should accommodate almost any language imaginable, and even mix and match languages for different parts and still work.
So, let’s say you had a support module for pumping into a message queue that your event collector would read events from. All of a sudden your response in your user module can integrate into an MQ with a one liner after installing that module.
With that kind of flexibility, the event collector is just a product of your desired configuration and components. It could be a database. It could be an existing system like logstash, or sumologic, or syslog-ng just using checks to pull from MQ and dump to a logging system, or even using module responses to trigger changes in orchestration, like puppet configuration updates for a class of server, or whatever you want . It’s whatever you want it to be. This should keep it open enough to be a snap-in component for enterprise and home alike, and create a market for developers to contribute dashboard solutions to work with it.
CRD, a linux daemon for EAL implementation. Copyright (C) 2016 Chris Punches <punches.chris@gmail.com> This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.