Tech starts and disruptive innovations are on the rise thereby keeping the big boys on their toes. It’s a trend that must remain constant in order to discover that next new thing that will take us from what we currently see as the norm.
Because technology is a long way off from giving up its place as the major economic driver in today’s world, it is vitally important to keep it sane. Keeping it sane means being able to manage the engine room. I call this engine room “The Software Development Room” How we design, code, deploy, version control, testing monitoring etc need to follow an order and with the right culture, the common problems associated within this engine room can be greatly reduced or prevented completely as errors made in this room have the tendency to affect product quality or even damage the reputation of a company.
The solution for this is DevOps (Development Operations). As described in the DevOps handbook, it’s a world where product owners, development, IT Operations and InfoSec work together not only to help each other but also to ensure that the overall organisation succeeds. For this to happen, we in the software development world must consider the tools, the culture and the processes in place to ensure fast flow work while maintaining the standard and highest possible quality. Going into DevOps, an organisation will need to think about the IT operations in place, the internal culture in dealing with issues and finally the promises and commitments they make to their customers. This article focuses on the DevOps idea within Technology driven organisations, Ie what basic concepts organisations need to have before embarking on their DevOps journey.
Its important to note that DevOps is not separate from agile but can rather be termed as a natural continuation considering its foundation originates from the lean principles, the theory of constraints and what we know as the Toyota kata movement which draws values like servant leadership, organisational change movement, high-trust management culture which should eventually lead to quality and fast paced delivery, stability, reliability within the tech-driven organisation. Obvious, but it’s interesting to see how neglect of any of the above can lead to the downfall of a technology firm ie being on top of the above is to avoid the downward spiral acts which are to:
- Avoid not having working middleware systems / IT operation/ Support systems
- Avoid making false promises as it eventually leads to tech depts and workarounds which ends up affecting quality
- Avoid tightly coupled work. i.e. as much as possible deal with dependencies early before starting the work.
Having dealt with these cultural necessities, it’s fair to say the organisation has prepared their background for a DevOps model. There is no hard and fast rule on how organisations must implement this, but the key areas that must be included in your DevOps model are
- The Principle of Flow
- The Principle of Feedback
- The Principle of Continual learning and experimentation
1) The Art of Flow:
Have you ever been involved in a project and yet are in the dark on where things are? Maybe you are the Project manager but it’s always an uphill task to get information from Dev or IT regarding what the issue is or at what stage development is. Information is just scarce even within colleagues working on the same project. Everyone has different opinions on what is wrong and there is the blame game going around and sometimes this confusion bug creeps all the way from the Development room to the boardroom. This is a clear indication or a good reason to start practicing Dev Ops and the principle of flow should be able to partly resolve this issue.
When we say “Flow” we are talking visibility and how we manage work through its phases. Who sees what, why, when and the value is attained from this visibility.
Thinking about Flow, we think of the smooth movement of value from Development to Operations. There are two key factors that must be considered and taken seriously within the art of flow ie visibility and how we manage each stage of the flow (WIP, Batch sizes, hand-offs and constraints)
i) Visibility:
Visibility is not only for or amongst developers. Visibility is for all involved in the project which clearly ties in with the fourth principle behind the agile manifesto (Business people and developers must work together daily throughout the project)
Consider a cross-functional or even component teams where work may be required to be passed down from one area or person to another. Perhaps the product owner have passed a business idea to dev team without the complete information or correct acceptance criteria, consider Development passing work to QA/Testers and while this is happening, the testers decide to track back to the acceptance criteria which wasn’t completed or properly documented in the first place, eventually leading to a re-work by perhaps developers or PO just because information was missing somewhere.
There are cases where IT ops get handed a completed work from development to deploy to a production environment and because IT Opps wasn’t part of the development in the first place, they tend to have no clue about what tool or function was added in the completed work or how it may impact existing infrastructure or service currently in production. They go ahead even with this ignorance and deploy to production anyway only to find out that something is not working right anymore. In the worst cases, there might even be an outage that severely impacts customers.
Flow should involve everyone that has a part to play in that product. Nothing moves from one stage to the other except it is completely close down, dependencies resolved and the relevant compliance checks are done. This should involve Development, Quality Assurance, IT operation and Security all working together and knowing and having inputs as work move from one stage to the other.
Ideally, many organisations will use software tools like Jira to manage this flow. Others may just have a physical Kanban board clearly visible for all to see, monitor, and discuss the flow of work, but the most important thing is all involved should be able to see and discuss this movement closing down dependencies and setting out mitigation plans for potential uncertainties.
ii) Limit Work in Progress (WIP)
We avoid multitasking as much as we can, therefore, limiting the number of work members of a team must or should be working on at every point in time. There are some cases where Developers are in the middle of a work and suddenly something urgent comes up through some unknown channel which prompts a member or two to abandon what they were doing to deal with that issue. Such situations often impact the quality of what they were previously working on considering the fact that by the time they come back to their previous work, they might need to re-establish understanding which might end up taking more time and in most cases, both the new urgent work and the previous ends up being uncompleted. Studies have shown that the time to complete even the simplest task of sorting out geometric shapes significantly degrades when multitasking. This applies too to the world of software development.
iii) Reduce Batch Sizes
There is and will always be the temptation to take on more work as during planning one or two of the team members may be feeling very optimistic. This work never gets done 95 percent of the times and we ought to ask ourselves why.
In order to create a smooth and fast flow of work, it’s important to reduce the batch sizes so team members would work with the least possible pressure and hence the possibility of having the best possible quality.
This ties in with a key lesson in Lean whereby shrinking batch sizes directly leads to an increase in quality and reduction of lead time. This means small batch releases, i.e we deal with a piece of function from conception to deployment, and then we add another function, and another until the whole product exists in production and then exposed to end users
iv) Reduce the number of Hand-offs
Multiple departments or roles could be involved in transmitting a code or completed work from one value stream to the other eg the performance of functional testing, integration or regression tests database changes and server administration, networking, load balancing and perhaps security.
As work passes from team to team, there is explanations, communications, documentation, calls and sometimes handover meetings and all sorts. This can sometimes trigger a queue as work might be waiting to be attended to by the relevant team or person and in many cases, this person is involved with another work. When the individual meant to do the work is eventually ready, the handover personnel might be busy with something else or perhaps will need more time to cognitively pass on the knowledge to the next team which all means longer time and work destabilisation.
The tendency to lost vital information or the probability of human error can be resolved by DevOps by the introduction of automation of work where ever possible.
Another way to significantly mitigate these many handover activities is to empower the team to have the knowledge base to handover products directly to the customer. Cross-functional teams are ideal for this situation otherwise automate as much as possible
v) Identify and elevate constraints
Dr Goldratt in his book the “The Phoenix Project” recommends a maximum of only one constraint in any value stream within the direction of flow of work. As he rightly explains it: Imagine a sandwich buffet system made up of four rows of individuals moving along as they make their sandwich. As the four rows move parallelly from the bread, sausage, and salad tables etc, all four finally meet at a single ketchup table which eventually causes a massive queue hence rendering the objective of having many rows to be pointless. Ideally, the ketchup should have been part of the menu for each of the rows for better efficiency and speed. This is same as software delivery, we want the relevant capabilities to be always available when needed in our quest to deliver software to production.
No matter how fast or efficient we engineer the development team to be, if their deployments are still being blocked by infrastructure or production support, then their efficiency is no good to anyone. This is where DevOps comes in, ie all relevant party to constantly work together for the delivery of that single product. As much as we may want to avoid constraints in our workflow as seen in the ketchup example above, we must do everything in our power to deal with any identified constraint and the recommended five steps to make this happen are listed below:
- Identify the system constraint
- Decide how to exploit the systems constraints
- Subordinate everything else to the above decision
- Elevate the system’s constraint
- If in the previous step a constraint has been broken, go back to step one, but do not allow inertia to cause a system constraint.
2) The Principle of feedback:
From the design phase all the way up to when the solution is in production, problems are found, some resolved immediately, some resolved later and some don’t even get resolved as the risk might be considered acceptable. This might end up resulting in the piling up of technical debt which in the long run affects quality but whatever it is, the engineering action here is “feedback”. The principle of feedback.
Because we know failure is inherent and inevitable in the world of software development, our delivery models should be designed in such a way that every single stage should have a proper feedback loop and response actions in place depending on what happens.
We know that doing the same thing twice does not necessarily leads to the same result and hence the reason to set efficient feedback loops that can easily detect issues and as early as possible long before they can lead to some kind of catastrophic event as I have often seen throughout my career in software development.
In order to be able to embed this into our model as we should, there are some key understandings we need to have as recommended in the Dev Ops manual
i) See the problems as they occur:
Speed, reliable, and relevant are the key words here on how we want to see problems within in our systems or value chain. The faster we see and resolve these issues, the better it is for all. A feedback and feed-forward loop to enable quick detection from the reliable source in order to trigger the relevant action. Having this in place goes along in not just providing us with the relevant feedback, but also informs us on how to prevent these issues from occurring again.
If we are experiencing a situation where the same issues keep repeating themselves, then that’s a clear sign that we do not have a working feedback loop embedded within our framework for if we do, then we should have known, or at least working on how to prevent that from happening again.
A Tech-driven organisation undergoing Dev Ops should embed this into the core of their delivery practices. Some very Dev Ops committed organisations might even go as far as creating a role solely for the purpose of creating feedback cycles as the value this eventually brings outweighs the cost of trying to fix an issue no one knows or understands how it occurred in the first place.
ii) Swarm and solve problems as they occur
There is one thing in having a very efficient feedback loop, but it’s another thing to deal with that issue fed from this loop. The people involved in resolving the issue in some cases might be different from people who found the issues. Whoever they are, the most important thing is that the delivery model should be designed in such a way that everyone involved in the project must swarm to resolve this issue as quickly as possible without spending too much time talking, blaming or pointing fingers on who might have caused it.
There are cases where some team members will come up with a workaround solution but the problem with that is we end up creating technical debt to be paid later and if we make it a habit to be relying on workarounds, we end up piling a bunch of technical debts which always come back to bite us in the near future.
iii) Push for quality closer to the source
As earlier mentioned, the people that might have found an issue are not necessarily the ones that get to resolve it depending on how that organisation operates and this often happens often especially in the world of software development and delivery.
If we think we are or thinking of practicing Dev Ops, then we need to start thinking to bring resolutions closer to the problems ie as close to the source as possible. We might have experienced situations like this where there is an issue in one of the environments and even though this might be seriously affecting delivery, we have to go look for some approver, in most cases a manager who has very little understanding of the issue to give the go ahead. It is known that the effectiveness of the whole approval process decreases as we go further away from where the work is being performed hence the reason we need the solutions and approvals nearest to the problem.
We want solutions to be provided at the source. The closer approvals are to the problem, the better it is in preventing loss of knowledge and hence dealing with issues while still fresh in the minds of everyone which eventually ends up reducing the variance between “who should do something” and “who is actually doing something”.
The type of quality controls we want to avoid as we aim to push quality back to the source is stated below:
- Avoid situations where you require another team to do work that can easily be automated. Eg requiring infrastructure to perform deployments which could have been automated within the Dev world.
- Avoid models that require approvals from busy people who are often times very distant from the work which ends up forcing them to make decisions on things they have no clue about
- Avoid creating large documentation of the issue which ends up confusing people and becomes obsolete at the long run
- Pushing large batches instead of very small batch of work to teams for approval or takeover which will require longer time for cognitive understanding and hence creates timewasting within delivery
3) The Principle of continual learning and experimentation:
Fear is one thing that every organisation or team should avoid and do well to kick out from their working environment especially if that organisation or team claims to be practicing DevOps.
A culture that promotes continuous learning and experimentation without fear ends up creating very healthy teams that drive the organisation forward and at a very faster rate.
The objective of this is to create a high-trust culture that reminds us all that we are learners who must take some risk in our daily work as we continually push ourselves a little further from the norm.
In doing this, we end up learning from our successes and failures which have the effect of making us a more efficient and quality driven organisation.
The tech industry is a rapidly changing one and we are bound to be left behind if we do not encourage the experimentation and learning culture which overall puts the organisation in a competitive place within the market space. To make this happen, the key things we must think to embed in our delivery model are:
- Institutionalise the improvement of daily work:
It can be very dangerous when people get comfortable with routines as this often leads to boredom and sometimes we lost very important resources because they don’t feel challenged enough. I have seen this happen often and in most cases, the reason is just the lack of Dev Ops practice within the Org. we improve daily work by simply and explicitly reserving time for dealing with technical debts, defects resolution, refactoring and improve problematic areas within our code. For teams practicing scrum, this type of work should be added to planning and sprint as they should be in the backlog.
i) Enable organisational learning and safety culture:
The avoidance of pathological systems (Involves fear and threats) whereby people end up withholding information for political reasons, or bureaucratic systems (too many rules and processes) where failures usually ends up triggering blames and people spend enormous amount of time talking about the problem and who to punish instead of thinking of solutions is a start in creating an effective learning culture.
What Dev Ops recommends is a generative system whereby individuals are encouraged to actively seek and share information to better the organisation as a whole. This eventually leads to a high-quality value stream system.
ii) Transforming local discoveries into global improvements:
A team learning something new based on the resolution of an issue is usually not enough. This resolution might have led to the discovery of something new to benefit the organisation as a whole and although the problem has not shown itself yet to other components or areas of the product or system, as long as there is the tendency for that to happen again if triggered by a similar case scenario, a Dev Ops practicing organisation will do well to make this fix or improvement a global one hence fixing issues even before they occur
Conclusion:
It’s important to note that this article is not a promotion to Dev Ops as to agile, but indicates the two are not separate from each other and hence every agile practicing organisation should be implementing DevOps as well as they two work hand in hand. The culture of resilience and a model that drives leaders to reinforce a learning and team building will certainly see the benefits not just in the end product, but will also create a healthy team environment.
By M.M Arrey
Software Delivery Consultant