Maintainability: Operability: Making Life Easy For Operations
Maintainability: Operability: Making Life Easy For Operations
The majority of the cost of software is not in its initial development, but in its ongoing maintenance:
Fixing bugs
Keeping systems operational
Investigating failures
Adapting it to new platforms
Modifying it for new use cases
Repaying technical debt
Adding new features
Operability. Make it easy for operations teams to keep the system running smoothly.
Simplicity. Make it easy for new engineers to understand the system, by removing as much
complexity as possible from the system. (Note this is not the same as simplicity of the user
interface.)
Evolvability. Make it easy for engineers in future to make changes to the system, adapting it
for unanticipated use cases as requirements change. Also known
as extensibility, modifiability or plasticity.
Good operations can often work around the limitations of bad (or incomplete)
software, but good software cannot run reliably with bad operations.
While some aspects of operations can and should be automated, it is still up to humans to set up that
automation in the first place, and to make sure it's working correctly.
Operations teams are vital to keeping a software system running smoothly. A good operations team
typically does the following:
Monitoring the health of the system, and quickly restoring service if it goes into a bad state;
Tracking down the cause of problems, such as system failures or degraded performance;
Keeping software and platforms up-to-date, including security patches;
Keeping tabs on how different systems affect each other, so that a problematic change can be
avoided before it causes damage;
Anticipating future problems and solving them before they occur, e.g. capacity planning;
Establishing good practices and tools for deployment, configuration managementand more;
Performing complex maintenance tasks, such as moving an application from one platform to
another;
Maintaining security of the system as configuration changes are made;
Defining processes that make operations predictable and help keep the production
environment stable;
Preserving the organization's knowledge about the system, even as individual people come
and go;
Good operability means making routine tasks easy, allowing the operations team to focus their effort
on high-value activities. Data systems can do various things to make routine tasks easy, including:
Provide visibility into the runtime behavior and internals of the system, with good
monitoring;
Good support for automation and integration with standard tools;
Avoid dependency on individual machines (allowing machines to be taken down for
maintenance while the system as a whole continues running uninterrupted);
Good documentation and an easy-to-understand operational model ("if I do X, Y will
happen");
Good default behavior, but also giving administrators the freedom to override defaults when
needed;
Self-healing where appropriate, but also giving administrators manual control over the
system state when needed;
Predictable behavior, minimizing surprises.