Folge 52: DevOps in Small Companies with Jonathan Hall [EN]

Klicken Sie auf den unteren Button, um den Inhalt von podcasters.spotify.com zu laden.

Inhalt laden

In this episode our guest is Jonathan Hall of Tiny DevOps. We invited him because he specialises in a very interesting and little-talked-about niche: DevOps for small companies.
DevOps is usually viewed in the context of large organisations. But what does DevOps look like in small organisations? Does DevOps make sense in this context? Is it even feasible?

In this episode, Luca and Falko from „DevOps auf die Ohren und ins Hirn“ are joined by Jonathan Hall from TinyDevOps. Jonathan shares his insights on adapting DevOps practices for small teams, emphasizing the importance of cooperation, continuous deployment, and cultural shifts in organizations. He challenges common perceptions about the role of QA and automation in DevOps, advocating for a more integrated and responsible approach by developers. The episode covers various aspects of implementing DevOps in small companies, including overcoming fears and resistance to change, the role of QA in a DevOps environment, and the effectiveness of continuous deployment.

Inhalt

  • Introduction and Background of Jonathan Hall
  • Jonathan’s Definition of DevOps and its Importance in Small Teams
  • Continuous Deployment and Its Impact on Developer Responsibility
  • The Role of QA in DevOps: From Testing to Advisory
  • Challenges of Implementing DevOps in Small Companies
  • The Concept of Blameless Postmortems and Learning Culture
  • Automating Deployment First vs. Automating Tests
  • Real-life Example of Transforming QA Roles
  • Recommendations for Small Companies Adopting DevOps

Shownotes

Find Jonathan at: https://jhall.io
He writes daily posts at: https://jhall.io/daily/

Transkript (automatisiert erstellt, wer Fehler findet darf sie behalten)

Welcome to a new episode of DevOps auf die Ohren und ins Hirn.
Or in English, DevOps from your ears straight to your brain.
Usually this is a German language podcast, but we sometimes make exceptions for international guests.
Today is one such exception.
My name is Luca and Gianni and I host this podcast together with my colleagues Dirk Söllner and Falco Werner.
We’re DevOps consultants, trainers and coaches trying to get teams to work together better and create better products for their customers.
Today it’ll be Falco and me running the episode and we’re joined by Jonathan Hall of TinyDevOps,
who specializes in bringing DevOps not to big corporations, but to small organizations or even sort of practitioners.
Hi Jonathan, thanks for being on the show today.
Hi Luca, thanks for having me. It’s a pleasure.
Did I summarize your line of work well or is there something you would like to correct me on?
Yeah, I think you did.
We get a lot of attention is paid to DevOps at FANG companies, Google and Netflix.
And that’s fun and exciting, but most of us never work in companies that large or that small.
So I try to focus on the other end of the spectrum, the small companies, you know, maybe a company of five people or ten people or even like you said, a solo practitioner.
I try to help those sorts of groups and those people do DevOps as well as possible.
Interesting. Now we have a stunning feature on the show where everybody who comes on comes on the show must give us their definition of DevOps.
So Jonathan, what’s yours?
It’s a team and it’s an engineer, of course.
I’m kidding. It’s not.
It’s neither of us.
It’s neither of those things.
My definition of DevOps, I like to, it’s an oversimplification, but I think it gets at the heart of what DevOps is about.
It’s one word cooperation.
DevOps to me is about Dev and Ops and other teams, QA and data science or whatever, cooperating towards the same business goal of delivering software.
I hear a lot of people talk about automation as DevOps or CI CD pipelines.
But I see that you can do those things if you still are not cooperating with each other.
You’re not really doing DevOps.
So to me, it boils down to are your different teams or different functions working together towards the same goal?
That’s how I like to define DevOps.
Oh, nice focus point.
Further, I’d like to say if you hear somebody talk about DevOps or job description, replace the word DevOps with cooperation.
And if it doesn’t make sense, they probably don’t understand DevOps.
Oh, that’s an awesome trick.
I’ve never heard of that.
That is not a bad way of looking at it, right?
Because, you know, I like to quip that, especially in job adverts, oftentimes DevOps is just a shorthand for an admin who can code badly.
Right.
Yeah.
If you see a job description for a DevOps engineer, replace the word DevOps.
Is it a cooperation engineer?
Not really.
They’re not engineering cooperation.
So there’s probably not really a DevOps engineer either.
It’s probably an admin, like you said, or an operations engineer or a CI, CD pipeline specialist or something like that.
And there’s nothing wrong with either of these things.
Of course not.
Yeah.
Just don’t call it DevOps.
Exactly.
Yeah.
I visualize if someone talks about a person, DevOps engineer, kind of like, I visualize it sitting on the wall of confusion between DevOps and helping to one side lower the height of the wall.
And maybe punching holes in it.
And maybe also, yeah, heaving packages that need to be deployed from the development side to the operations side, which is kind of like what cooperation or collaboration is about, right?
Exactly.
Yeah.
Yeah.
But that brings me to a good question, I guess, which is, does a small organization,
have walls of confusion, do small organizations even need DevOps?
Because it sounds like they shouldn’t.
Yeah.
I see what you’re saying.
We’ve already mentioned briefly the idea of solo practitioners doing DevOps.
And you can ask yourself the question, does it make sense if I’m a solo, maybe a solo founder, or maybe working on a hobby project, can I do DevOps?
Can I literally do it as a single person?
Am I cooperating with myself?
And maybe in the strictest sense, not really.
I don’t know.
It’s up to the dictionary writers to decide whether a person can cooperate with themselves.
But I certainly think you can take the mindset of DevOps, which is that DevOps should have their goals aligned to produce value for the customer.
I think you can do that.
And maybe in another sense, you automatically do that if you’re a solo practitioner.
I suppose you could imagine a way to throw something over that wall of confusion if you’re by yourself.
I don’t know what that would look like.
Maybe you write your code.
And then you leave.
And then you leave for a week and come back and then deploy it.
I don’t know.
So in that sense, maybe everybody who’s a solo practitioner already does DevOps in the strictest sense.
But if we go up just a step from there to two to five people, I definitely see this wall of confusion in companies I work with.
And part of it often is that these companies are trying to be bigger than they are.
They want to look like an enterprise.
So they set up their DevOps team or their operations team.
They have a team of two people or one person, maybe.
And they have their developers on another team.
And the developers write their code.
And then they package it.
And they send it over to the operations guys.
Maybe they don’t need that.
But they feel like that’s the way enterprises do it.
So that’s the way they should do it.
So they sort of adopt this oversized mentality.
If it works for Google, it works for me, right?
Right, right.
Yeah.
And I think you bring up a good point about cooperating with yourself.
If it’s really just a one-person endeavor, then, you know, there’s still present you having to work together with future you.
Definitely.
Definitely.
Yeah, I really like that concept.
I’m glad you brought that up.
That you never really work alone, even if you’re working alone, right?
There’s always the present, the past, and the future yourself.
And when I write code today, one of my best teachers as a coder has been my past self and how stupid I was six months ago.
Oh, six months.
Try six hours.
Fair enough.
But, you know, what was I thinking back then when I did it that way?
Why did I design that code that way?
You know, that’s a great teacher when it comes to documenting your code and building even a CICD pipeline for a solo project.
It’s easy to think, oh, I don’t need CICD.
This is a simple thing.
I only change it once every three months.
And when I do, I can type the commands myself and upload it to the server or whatever.
Maybe it’s a hobby website or something.
But three months later, you’re going to forget what those commands were.
Maybe you put a readme or something like that.
But then you’re going to forget to update that readme when you change it.
Just put it in CICD as a pipeline.
It works as living documentation.
So that’s kind of what I focus on.
Those sorts of things when I’m working at the solo practitioner level.
You can still do these things because, as you said, you are working with your future self, if nobody else.
And your future self will really…
You will really appreciate that CICD pipeline when it comes time to deploy something and you’re in a hurry and you don’t remember what was that thing you did last time that fixed it.
I was just wondering, do you have something…
Do you have an example to illustrate this situation where you really have a lack of communication or where you have worlds of confusion within even, I don’t know, a five-person shop or something?
Let me think about that.
Yes.
So a client I was working for…
Just a few years ago, a couple of years ago, actually.
They were a small startup.
They were growing quickly.
But when I joined them, I think there were about eight engineers.
So it wasn’t quite a five-person shop.
But it was very compartmentalized.
The CTO had designed the architecture of the system and designed it for a team of probably 100 people.
So there were maybe too many moving parts, too many microservices.
But that did mean that we had these teams working in small, isolated areas.
And what should have been an API contract between teams really wasn’t.
It was a human barrier.
So when they needed to do something, they had to ask another person, can you do this thing for me or can you tell me how to do it?
And that other person was, if you’ve read the Phoenix Project, what was the character’s name?
Brent?
Brent, yes.
This was the Brent of our company.
Everybody went to Brent for these things rather than an API contract or whatever.
So that definitely happened in that scenario.
And this…
And this…
And this solution was…
To the extent that we found a solution, it was adopting some of these DevOps types of practices.
We did institute a sort of platform team.
We called it an SRE team.
Some people incorrectly called it a DevOps team.
But they managed the infrastructure.
They provided the…
They also provided the core libraries that were used for logging and monitoring and things like that.
So they provided that sort of infrastructure that everybody…
Everybody else then could use.
Okay.
So since we’re already starting down that road.
So how would you do DevOps at a small organization, especially in contrast to a large organization where you probably have like a bunch of teams, maybe one per microservice or something?
If you’re just three guys, what does DevOps look like?
Yeah.
Well, before I answer that question, I’d like to talk about a related topic that I think is relevant to the answer.
And that is…
Yes, please.
I see two really, really high level ways that DevOps is usually implemented.
And if you come from a Scrum background, you’ve already had it drilled into your head that every team should be cross-functional.
And that tends to be the way a lot of Scrum practitioners approach DevOps.
So they put a DevOps engineer, so to speak, an operations person on each Scrum team.
And that’s fine.
That works.
But it only works for really small organizations.
As soon as you get past three to five teams…
…this cross-functional DevOps team model starts to break down because you have your operations knowledge scattered around all these teams.
And that’s when I think it makes the most sense to approach more of a platform team, which is more like what Google does with their SRE model.
The book Team Topologies talks about this approach.
And that’s what I did at this company I just mentioned, is we implemented an operations team that handled this platform and provided services to developers.
So…
So…
So, now to your question, how does DevOps differ between small and large organizations?
In large organizations, you almost certainly need to adopt some sort of platform approach.
You simply…
That doesn’t mean you can’t do a hybrid.
You might also have an operations person on some or even every team.
But they need some sort of central knowledge repository that I refer to as a platform team.
So large companies like Google and Netflix must do that.
If you’re a small company, maybe below 30, maybe even 50 engineers, you can maybe get away with this cross-functional approach.
If you’re three, you almost certainly should do the cross-functional approach.
It doesn’t probably make sense to have…
In fact, I’ll go out on a limb and I’ll say it doesn’t make sense if you’re only three engineers to split the responsibilities that way.
Because you’re just…
You’re going to have a bus factor of one if you do that.
I mean, either you have…
One developer and two operations people or you have one operations person and two developers.
Either way, you’re in trouble if somebody goes on holiday.
So if you’re that small, you really need to do the cross-functional approach.
Which maybe you still have one of those three who is the primary operations person.
Maybe they’re the one who knows how to do the deployments the best.
Maybe they write the CICD scripts and so on.
But the others should at least have a basic understanding of how it works.
So that when that brand is on holiday, they know how to fix things.
So that’s probably the biggest difference, at least from the 10,000-foot view.
The biggest difference between large and small organizations is that when you start to get really small, cross-functional makes perfect sense.
At some point, you have to start transitioning.
If you’re going to grow, you have to start transitioning to that platform model.
How many levels of platform do you think is appropriate?
Can you map this to kind of…
…organization size?
Yeah, that’s a good question.
I don’t think there’s a single answer.
I think it depends a lot on what you’re doing and how complex your platform is.
Let’s just say you decide to go with Kubernetes.
You probably need to go to that platform level sooner than if you’re deploying to an SFTP server, for example.
The management overhead to just SFTP something to a website.
And it runs some PHP scripts.
That’s pretty low overhead in terms of the operations side of things.
If you’re doing Kubernetes, you probably need…
You have more overhead there.
You get a lot of benefits as well, of course.
But you have more overhead.
You probably want more people who have an intimate understanding of Kubernetes to do that sort of thing.
So it’s going to depend a little bit on what tech stack you choose.
And hopefully that’s driven by your business needs and not just because you think Kubernetes is a good thing.
So it depends a lot on what your business needs are.
How much availability do you need?
Do you need high availability?
Is it okay if your website goes down during upgrades?
If it is, then just SFTP, you’re probably fine.
If it must be up all the time, even during upgrades, you need some sort of rolling upgrade mechanism.
And that requires additional manpower.
And if you need alerting…
It depends a ton on what you need from a business standpoint.
And how you choose.
And how you choose to implement those.
But if I try to answer the question a little less like a consultant…
I’m just going to pull some numbers out of the air here.
But I would say less than 25 or so engineers, cross-functional, is probably a good idea.
If you expect to grow beyond that, you should start looking at a platform team.
Once you get to the point that your platform team is more than functional…
… then you want to start looking at multiple platform teams.
And how you split that is going to depend on a lot of things also.
For example, the company I mentioned earlier…
We had a single platform team, but then we also had a data team that also managed some of their own infrastructure.
So maybe that’s a logical split in your case, to have a data platform team as the second platform team.
Maybe you split your Kubernetes management from your monitoring or something like that.
There are many ways you can split.
And I don’t even want to try to tell anybody who I haven’t met and spoken to what they should do.
Because it so depends on business needs and what problems you’re facing.
But the same rule with your scrum team size.
You don’t want more than 5 to 8.
12 is probably an absolute maximum.
If you have 12 people on a team, it’s already a hard advantage.
So the same sorts of team size rules apply to your platform team.
Start to split that when they grow.
Just to ask the opposite question.
How small should a platform team be allowed to get?
I think you should have two minimum.
And that’s why I was talking earlier about if you have three engineers total, you probably should do cross-functional.
And the main reason for that is this bus factor idea.
You don’t want to be high and dry when your only operations person goes on holiday.
Or when they quit and you can’t hire somebody to replace them for three months.
So I would say absolute minimum two.
If you have fewer than two, you’re taking on some risk.
Maybe that’s appropriate for some business situations.
But you’re taking on a high level of risk.
All right.
Assuming that there is a company that wants to move towards DevOps.
You know what?
What’s the best way to do that?
What’s the best way to do that?
What’s the typical pitfalls?
What’s the typical ways of messing this up?
The easiest way to mess it up is just give somebody a title.
Say they’re the DevOps guy.
And then don’t change anything about what they’re doing.
And I’ve worked with companies that do this.
They just think, oh yeah, DevOps, it’s just a new word for deploying software.
So he’s our DevOps guy.
That’s not DevOps.
If it works for you, of course, fine.
Go ahead.
But don’t fool yourself into thinking you’re doing DevOps.
Yeah.
Just to jump in.
I’ve met companies that, you know, they really meant well.
And they were trying to, you know, move towards DevOps.
But they essentially never got past that stage of giving somebody a new title.
But they didn’t really get as far as changing the way of working.
I don’t even know why out of reluctance or out of thoughtlessness.
They thought, yeah, you know, now we’ve given this guy a new title.
And now he’s doing Kubernetes.
So we’re done, right?
Yeah.
You know, I have to be sympathetic to people who do that.
Yeah.
You can’t expect every hiring manager, every team lead, every CEO, every CTO to understand what DevOps is.
Especially considering how terribly abused that word has become.
It’s probably worse even than Agile.
And Agile is terribly abused.
So you have my sympathy.
If you’re listening to this and thinking, what?
I changed the guy’s title to DevOps.
I thought that I was done.
You have my sympathy.
I’m not picking on you.
I’m sorry that you were.
Hopefully we can help you get back on track.
It’s so easy to do that.
Because that’s the buzzword these days.
All the recruiters are looking for DevOps engineers.
And they’re trying to blah, blah, blah, blah, blah.
Back to the original comment I made.
DevOps is about cooperation.
If you’re a DevOps guy.
If you have a DevOps guy.
And you have people in your organization saying, I’m waiting for the DevOps guy.
Or that’s the DevOps guy’s job.
Or I’m going to.
Give this to the DevOps guy.
That’s not DevOps.
The DevOps is about cooperation.
If you are not cooperating, you’re not doing DevOps.
So how can you start to approach this?
I have two answers to that.
And I think they can run in parallel.
And these are the two approaches I use whenever I go into a new company.
I try to do these simultaneously.
The first one is the hardest.
And the most complicated.
And the scariest.
But I’ll mention that.
And we can talk about the details in a moment.
If you want.
And that is to get continuous deployment working as soon as possible.
Continuous deployment is the idea.
The way I define it.
As soon as the developer hits the merge button.
On GitHub or Bitbucket or whatever version control you’re using.
As soon as the developer hits the merge button.
An automated series of events occurs.
That puts that software in front of live customers.
Immediately.
There’s no additional manual check.
You don’t hit the merge button.
And then a week later, the QA guy comes along and says, yes, this is good.
And then we’ll hit it in front of customers.
You go from merge to in front of customers as quickly as possible.
You can still have all your manual QA.
You can do all of that before the merge button happens.
There are some technical complications there.
But we can talk about that.
But the basic idea is get this continuous deployment in place as quickly as possible.
And the reason for that is because it changes the mindset of the developers.
For one thing, it gives you faster feedback.
And that’s always useful.
And that’s one of the core.
Ideas behind Agile and DevOps is to get get our stuff in front of customers quickly so we can get feedback from our customers quickly and we can make changes if we need to.
That’s all important.
That’s great.
And I love that.
But for the purpose of DevOps, the main reason I like to get continuous deployment in place is immediately is because it changes the mindset of developers and the developers as soon as they know that when they hit that merge button, that customers are going to see their changes in within five or 10 or 20 minutes, it puts a new weight on their shoulders.
They take on a new level of responsibility and a new sense of ownership for that code that they’re writing.
And and and you know, I’m not I’m not trying to pick on anybody.
I see I’m saying this from my own heart, from my own experience as a developer, when I know that my code is not going in front of customers for another week because it’s going to be batched up with a bunch of other changes.
I’m a little bit more lazy and I don’t check quite as closely at those changes when I on the other hand, when I do know that it’s going to be in front of customers in five minutes.
I will review my own code one more time.
I will do those extra checks.
I just am a little bit more careful.
I hesitate a little bit longer before I hit that merge button.
Did I forget anything?
And so it’s that extra sense of responsibility and ownership that I think is most important for the purpose of adopting DevOps.
With that new sense of ownership in place, the developers will then do everything they can, at least we hope, just the responsible ones will do everything they can to make sure that code is of high quality.
That it’s been properly tested, that it’s gone through all the checks, the configurations are correct and so on and so forth.
It goes back to what a lot of people describe DevOps as is you write it, you ship it.
So it gets you into sort of that mode.
So that’s the first thing I like to start to get in place is continuous deployment.
Once that’s in place, then we can go back through and start filling in gaps where maybe we can automate some testing.
Maybe we need to do other things.
But to start with.
That merge button to deployment automated and then everything before the merge button, which can include manual QA, if you need that or you want that, it can include writing documentation, it can include feature flags, it can include a million other things.
And those might even be slow and painful, but make that merge button the last human interaction before that code goes live.
And that is one of the first things I do.
The second thing I do, and I do this in parallel, is I advise companies to start doing blameless postmortems.
As soon as they have an outage.
I’ve been at a couple of companies where I was sort of waiting on the edge of my seat for the next outage because I was excited to pull everybody into a postmortem.
I won’t tell you those companies because I don’t want those companies to come after me later and say, I can’t believe you were waiting for us to fail.
But that’s the second approach.
And the reason I like postmortem.
Well, let me contrast those first to retrospectives.
If you’ve been doing.
Scrum, you’re already familiar with the concept of retrospectives, and I believe those are really valuable, too.
But retrospectives are often dry and boring.
And I think that’s why we often have lots of scrum masters trying to find new ways to gamify retrospectives.
And they give you these new charts and these new ways to do it.
And that’s all fine and that’s good.
And I think it’s important to do retrospectives on a regular cadence every two weeks, typically in scrum even more often is great.
But the reason I like postmortems even more.
And in fact, if I had.
To choose between if I joined a company that was not doing either one, I would be postmortems first and I would use that to build into retrospectives.
The reason is because postmortems by definition happen after something is broken and everybody’s already eager to figure out how to solve that problem.
With a retrospective, you often have people trying to think what’s something that we could have improved last week.
I don’t know.
Let me think about it.
Oh, yeah, there’s that one thing.
So, you know, there’s this sort of you have to dig sometimes to find things to improve.
And with a retrospective, I’m sorry, with a postmortem, nobody has to do that because we know the database went down yesterday and we lost 10,000 euros of sales or whatever the thing is.
It’s fresh in everybody’s mind.
We know what happened.
We know what the problem was.
And we hopefully know how we fixed it.
Hopefully it’s been fixed by then.
So it’s fresh.
It’s clear.
And we have an objective.
And the objective is, let’s make sure this doesn’t happen again.
So that’s the second thing I do is I lead a team through.
Doing postmortems.
Every postmortem needs follow-up.
And that follow-up becomes a perfect retrospective if you’re not already doing those.
So that’s my two-pronged approach.
Get CD in place as quickly as possible and start doing blameless postmortems as quickly as possible.
And those two things snowball on each other.
And they help you to start developing this culture of learning and improvement and help you to adopt the DevOps cooperation mindset.
Yeah, that’s great.
That’s really interesting because it also, I think, dovetails into, you know, building out essentially the three ways of Dean Kim, right?
Getting into the habit of doing systems thinking, building feedback loops and opening yourself up to experiments, even if those experiments might be inadvertent and they come in the form of, you know, things going wrong.
Yeah, definitely.
Those things, those all apply to the three.
I mean, the first way, of course, is about the flow of work.
Work should flow in one direction only.
That’s related to the CD idea.
If you have code that’s quote ready, but then it has to go through approvals, whether it’s some sort of change board or QA or who knows what, some product owner has to approve it.
And then maybe they don’t approve it.
It goes back to the developer to do some tweaks.
And, you know, you have this back and forth.
That’s not a good flow approach, according to the first way of DevOps.
The second way is all about feedback loops.
Both of these.
I mean, the postmortem is an obvious example of a feedback loop.
Something’s gone wrong.
We’ve investigated it and we’re trying to fix it.
So, you know, that’s feedback and acting on feedback.
And the third way is experimentation and learning.
And both of these, I think, fit into that.
I mean, the CD idea itself.
I mean, you could just implement CD and stop.
But that’s not the starting point, not an end state.
Right.
So once you have.
My idea of this sort of lean CD approach in place, you start building on top of that.
You don’t stop.
You then start to iterate and improve the process, find ways to make it faster, to reduce the feedback times, to reduce the manual toil.
Exactly.
And I think especially once you set up, for example, CD, it comes kind of naturally to want to improve on it and automate more things.
You don’t have to prod people say, you know, what else could we do now?
They’ll be bursting with ideas.
If you know if your customers.
Are anything like mine.
Absolutely.
And, you know, the book accelerate talks about this, that the companies that do CD have a more mature culture in terms of these sorts of things.
And they should they report higher workplace happiness.
They report, you know, obviously also report better business outcomes in terms of faster software delivery and things like that.
And maybe that’s the reason that a CEO wants to implement that.
But it has other other impacts that are at least as meaningful in my opinion.
You know.
It feels good as a developer.
I get a little bit of a dopamine hit every time I release something in a production.
If I release something and it just sits there in a bucket for three months before it’s released, that dopamine, there’s nothing there.
I don’t care anymore.
But if I can release something now and I can see it in front of customers in ten minutes, that feels good.
And I like that.
And I want more of that.
And I think that gets back to something or it relates to something that I think is really important and one of the main reasons I like to do CD before automating.
I don’t know how everybody thinks about this, but I know that when I first learned about CI and CD, my approach and I think I’ve seen this in many companies and it seems to be the general concept around it.
Well, let’s start.
Let’s start writing some tests.
You know, so imagine you have a code base and everything is manual.
You still do manual zip files to deploy.
Everything’s 100 percent manual.
But you want to move to CI CD if you ask a hundred developers or a hundred engineers.
Yeah.
I think most of them will say, well, the first step is to start writing some automated tests, maybe adopt TDD, get some good code coverage in place, automate that so that whenever you make new changes, the CI pipeline runs and gives you a green check mark.
Do that first.
Once you’re confident that you have a good code coverage, you can fire all your manual QA people.
I don’t know if that’s a good idea, but that’s kind of the mentality right now.
You can fire your QA people because we have our automated tests.
They’re up.
They’re good enough.
Now we have 98 percent coverage or whatever.
Okay, now that’s in place.
Now let’s look about automating our deployments.
And I think that’s entirely backwards.
I think it makes much more sense to automate your deployments first and then worry about the tests.
Because for one thing, maybe firing your your manual QA isn’t a good idea in the first place.
Maybe they maybe they provide some value.
Who knows?
But you don’t.
I mean, whatever whatever checks you’re doing right now with your manual process, whatever it takes to become confident.
If you’re confident with the next release, you can still do every single one of those things, just do it before the developer hits the merge button, not after.
Just change your thinking in that small way and you have continuous deployment.
That’s that’s all it takes.
I say that’s all it takes.
It’s really scary to do that.
It’s it’s it’s it’s a big mindset shift.
It sounds simple, but it’s it’s much easier said than done.
But if you can get over that hump and start doing your CD.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Regardless of everything that comes before and you can still have your manual QA checks if you want to, then that puts you in that mindset where that dopamine hit starts coming through that extra sense of ownership comes through on those developers and the desire to start improving everything else you haven’t you have it becomes much more clear what the next step for improvement is.
I’m just wondering if you do everything else before hitting the merge button, you have to.
Have some kind of environment where testers can do manual tests.
What do you have to have in mind that’s relevant to be done in an usual case before merging and doing the continuous deployment?
That’s an excellent point.
Yes, you definitely need the ability to do your testing before merge in that case.
Ideally manual testing and whether you have dedicated QA people or not.
Your engineers, your developers should be able to do manual testing on the code they’re doing anyway.
So, yes, you need some sort of test environment.
The ideal, the sort of gold standard, in my opinion, is that every time you create a pull request,
it should fire up a temporary review environment that you can test against.
Now, that’s not applicable in all situations.
If you’re building a mobile app, maybe that’s not even possible,
because you need to install that mobile app on a mobile device or an emulator or something like that.
But that’s the gold standard, and you should aim for as close to that as you can get.
For many small teams, it’s good enough to just have a single test environment.
Maybe you have your production in a test environment,
and the developers can, whenever they want to, push a code change to test, and then test it there.
That’s usually good enough for a small team, maybe three to five developers.
Once you get beyond that size and you start having contention and developers are fighting over,
I’m waiting for Bob to finish his testing before I do mine,
that’s when it’s time to start finding a solution.
Maybe you add a second test environment, or maybe you can, depending on their tech stack and so on,
maybe you can build these temporary environments that come up per developer or per pull request or something like that.
But yes, the core answer is yes, you need a test environment.
In some cases, you can maybe test on your local machine.
In many cases, not. It depends on what you’re doing.
You need a test environment that the developers at least,
and if you have manual QA people, that they can use as well.
One thing I’d like to point out, when people start exhibiting skepticism about this idea,
humans shouldn’t be doing this testing, it needs to be automated or whatever.
Remember, humans are Turing-complete machines.
Anything a computer can do, we can also do.
It takes longer, no doubt, but we can do it.
I mean, the first Turing machine was invented by a human, right?
It happened to be Turing.
Since then, every other Turing machine can emulate every other Turing machine.
So, you know, anything you can automate, a human can also do.
And so that’s actually the first step I advocate when implementing this, is write a checklist.
And that’s your human programming instruction, so to speak.
Just write a checklist of the things that should happen before you hit merge.
QA should sign off. We should write documentation.
We need to run the linter. We need to run the unit tests.
You know, whatever. Your list could be 50 things long, it might be five things, it doesn’t matter.
Just write a checklist.
And over time, you can start to automate those and turn those into machine-readable instructions instead of human-readable instructions.
That’s fine.
But remember, anything you can automate, humans can do.
In fact, anything you can automate, humans probably should do before you automate it to make sure you’re doing it right.
And make sure that it needs to be automated.
Don’t fall into the trap of automated things just because you can, if there’s no business value.
Okay. Your checklist that you just mentioned sounds very much like a definition of done to me.
What’s your point?
I agree.
I have an email course on this topic.
And the first lesson or the second one is build a checklist.
If you already have a DOD, start with that.
So definitely, it is a definition of done.
Sometimes your definition of done will vary a little bit depending on how you write your definition of done.
But yeah, they’re essentially the same thing.
Put your document, whatever you need.
And I hear people complain, or not complain, but express worry, like,
well, we can’t do continuous deployment because we need to let our partners know about upcoming changes.
Okay, make that part of your DOD.
Make that part of your checklist.
Send an email blast that says, this is what’s going to change when I hit merge tomorrow, or whatever it is.
So sometimes the definition of done might be exactly what you need.
It probably needs to be adjusted a little bit.
Just try it a few times and see what’s missing.
What’s that one step that you did?
That isn’t documented?
Edit the document.
Yeah.
And I think good teams are the ones that improve.
And you often see that in changes in the definition of done.
So things that you automate can be taken out of the manual part of the definition of done
and moved into the deployment pipeline part of the definition of done.
And when you see new things like outcomes out of a blameless postmortem, for example,
you can add.
There’s a new checklist point to the definition of done.
And when you see you have to do it very often, it takes time.
It’s good to automate.
Put it into the automation part.
Yeah.
And just to point out, there doesn’t even have to be a separate automated and manual part.
I’m not sure whether you’re familiar with the term null automation.
It’s one of my favorite tiny, tiny tricks that make such a big difference, which is,
you know, if you want to.
You write an automated CD script or something, but you for some reason don’t want to automate
all the steps yet, either because you don’t really know how to do it yet or it’s too much
effort or you just want, you know, you just want to brain dump all of the steps into the
script.
You can just have the script and have essentially a print statement, you know, echo, send an
email blast and then echo, do the acceptance test or something.
And so you can stick that into your CD and obviously like it doesn’t do anything but give
you a checklist and then over time you can take some of those elements and replace them
by actual automation.
And that’s just such an awesome trick to use to also maybe take away some of the fear of
automation and say, you know, this is such a big deal and so much effort and should we
really, you know, just do the simple thing, just do null automation.
I love it.
I love it.
Okay.
So now it’s easy, right?
Now everybody can just do DevOps.
Ha!
The hard part is never the technology.
I know.
I know.
As long as we’re working with other people, it’s never going to be easy.
That doesn’t mean it’s not rewarding, but it’s not easy.
Always those pesky humans.
Sorry.
Sorry.
And the more people…
Yeah, the more people you add, the harder it gets, too.
But yeah, that’s a common theme I have seen for years.
The hard part is not the technology.
I mean, we spent 20 minutes talking about how to do CD without automation.
You know, if I can describe it in 20 minutes, it can’t be that hard.
The hard part is convincing somebody that they ought to do it and that it’s a good idea
and that it’s not going to destroy their business.
These are the hard parts.
And then once you’ve done that, convincing the developers on the team that this way,
it works, and even though it’s different than what they did at the last company,
these are the hard parts.
Building the culture of learning and experimentation,
just making people feel safe to experiment with these things,
is half of the battle, I think.
Whether that person is your CTO or your boss or your colleague sitting next to you in the cubicle,
everybody involved needs to feel safe to experiment with these things
or they’re not going to do it.
They’re going to keep doing what they’ve been doing,
even if what they’ve done is not going to work.
They’re going to keep doing what they’ve been doing doesn’t work very well
because they’re comfortable with it.
So that’s the biggest part.
Make it safe for people to express their opinions and their concerns.
Give them the confidence that what you’re doing will either work or can be reversed if it doesn’t.
And yeah, make people feel safe, both emotionally and technically.
And that’s the hardest part.
For all my years doing this, that’s always been the hardest part.
What’s your trick?
Keep trying.
Yeah, I don’t have a silver bullet.
Persistence.
Persistence, absolutely.
You have to say the same things over and over again, sometimes to the same people.
Sometimes you just have to take somebody by the hand and say,
I understand how you feel.
Maybe not literally, that’s kind of creepy, unless you’re working with your spouse or something.
But take them by the hand figuratively and explain,
I understand how you feel.
I understand why this is scary.
I’ve done it before a couple of times and nothing disastrous has happened.
I can talk about the time I fired my QA team and the world didn’t blow up.
So it helps to have some experience under your belt.
But nobody else has had the same experience I have had or that you have had.
So you can’t assume, of course, that everybody else understands where you’re coming from.
And communicating that to someone is difficult,
especially because they’re in a different emotional state than you.
So, yeah, learn communication skills, learn empathy and patience.
Those are the best pieces of advice I can offer in that regard.
Yeah, that’s really the magic, isn’t it?
If you want to make those kinds of changes,
you really have to acknowledge everyone’s humanity
and acknowledge that they might feel apprehensive about the changes
and that they don’t want to mess up because, you know,
everybody wants to do a good job and everybody wants to feel effective.
It doesn’t matter whether you’re four years old or 40.
That’s, I guess, just how humans are, isn’t it?
Exactly.
But I’m so curious about the story about when you fired your QA team.
Didn’t you just say that we need them?
Yes. So I didn’t fire all of them.
There’s a spoiler alert right there.
So I was working with a company.
It was an e-commerce department in a retail company.
So the developers were responsible for building an e-commerce platform
that sold their products online.
And we had…
I think we had five scrum teams when I joined.
And they were doing releases every couple of months or something like that.
So it was a bad situation to begin with.
Part of the makeup of the team was we had a QA person on each scrum team.
So five scrum teams, five QA people.
Only one of those QA people was an in-house employee.
The others were offshore from another country.
So a completely different office.
They all worked in the same office, but in a different office.
And payroll…
They were all paid by a different company.
And over time, the problems with this arrangement started to become more evident.
On the one hand, there’s a bit of an adversarial relationship,
which I think often happens when you have QA doing manual acceptance testing.
Because you end up with a situation where developers are trying to push code through,
and QA acts as sort of a goalie.
And they maybe kick things back every now and then.
And…
Even if…
You have the best personal relationship between your dev and your QA,
there’s a little bit of an adversarial relationship there.
Even if it’s professional and friendly.
It wasn’t always professional and friendly in this case.
We had in particular one developer and one QA who just did not get along well.
And it didn’t work.
Aside from that, perhaps the bigger problem…
We can just chuck that up to personality differences if we want to.
But the problem that didn’t come down to that was that at the beginning of every sprint,
the developers were busy building their new features.
And the QA people were bored with nothing to do.
Because the stories hadn’t been completed yet.
And then the second half of the sprint, the QA was busy testing things away.
And the developers were bored because they’d already finished all their work.
So we had this huge disconnect and complete inefficiency in terms of the flow of work.
We spent several months worrying about this
and trying to find ways to tweak this.
And nothing worked.
Eventually I made the decision to bring in a freelance test automation engineer
for a six-month contract.
She came in and helped us build a test automation platform with Selenium.
And I don’t remember all the tools she used.
And trained our in-house QA on how to use these new tools
and to write tests in Cucumber and Gherkin and all this stuff.
Of course, everybody was apprehensive about this change.
We’ve talked about this fear is a big problem with these sorts of changes.
And that was definitely the biggest problem here.
Everybody was concerned.
All the POs were concerned.
Once the QAs are gone, our quality is going to plummet.
We’re going to have all these customer complaints.
It’s going to be terrible.
So we set up.
So I don’t remember the timing exactly,
but whatever date was the last day for our offshore QAs.
Of course, we had a little farewell party for them, a virtual one.
And then the next week we scheduled regular meetings with meetings.
Me and the two remaining QAs,
the permanent one and the freelance one who was there for another month.
We were going to have regular meetings to discuss all of the crap that went wrong
and how to respond to it quickly.
We canceled the meeting after two weeks because nothing was happening.
Nothing broke.
Nothing went wrong.
Nobody’s hair caught on fire.
We had nothing to do.
Everything was fine.
Now, the developers on these teams were a little bit apprehensive.
And after the fact, some of them told me,
it was a little bit annoying that we had to write tests ourselves
because we really wanted to keep focusing on writing our code.
But it was so much better than before when we had to wait on the QA.
Okay, that’s cool.
Great.
So we still kept our in-house QA.
And then, of course, the freelancer, her contract expired and she went on to another company.
So we still had one dedicated QA in-house and he took on a support role.
So his role was essentially, he became this platform team.
I said earlier, you shouldn’t have a platform team of one.
We did in this case.
He was our QA partner.
We were a platform team.
He maintained the testing infrastructure and he was available to help the developers if they had questions.
How do I write this sort of test or how do I test this thing?
He was available to help them with that, but he didn’t do the testing for them.
And that was the key.
So he became a support.
He took on a support role, maintained the platform and provided training and assistance to the developers who were writing their own.
So interesting to hear you say that, because you’re probably not aware, but Falco and I wrote a big training.
It’s called DevOps Specify and Verify on, well, on exactly those topics, right?
And one of the things we keep banging on about during the training is that you shouldn’t have a QA person who actually writes tests.
Instead, you should have somebody who’s just like you described, some kind of a QA advisor.
Somebody who takes care of the plumbing, somebody who is available to answer questions on how do you test this?
Is this tested well enough?
Are you happy with the coverage?
Are you happy with the coverage that you’re seeing, et cetera, et cetera?
And that’s just such a game changer, isn’t it?
Definitely.
And it frees up the developer to, once again, to own what they’re doing.
They can feel that ownership and that responsibility.
They’re responsible for the quality.
Who would hire a plumber that says, okay, I finished installing the toilet.
Now let me have my tester come in and see if it’s working right.
Nobody’s going to do that.
Why do we have such low standards of developers that we don’t expect them to write code that works?
I feel like it.
If you can’t as a developer be at least relatively confident that your code works because you’ve tested it,
you shouldn’t be calling yourself a developer yet.
You’re still learning.
So I completely agree.
I look at the Dev-QA relationship as almost a parallel to the DevOps relationship that DevOps is supposed to solve.
Your operations people shouldn’t be doing the deployment for your developers.
They should be supporting the deployment for your developers.
They maintain the infrastructure of the platform and they support them.
If a developer doesn’t know how to write a Kubernetes manifest, we don’t blame them.
The operations people can help with that, but they’re not going to do it for you.
But it’s interesting.
I teach a lecture on software QA at the local university.
And so this is a course for students who study in the evenings and they work at regular companies during the day.
I always have this little survey in the beginning about who of you software developers,
writes tests and there’s like one or two hands, very tentatively going up out of, I don’t know, 25 people.
And I just, I could never work this, it feels so unsafe once you’ve gotten used to having a solid test suite behind you.
I can understand the debate between test first and test after.
I’m a TDD guy myself, but I understand a lot of people aren’t, that’s fine, but no tests at all, no tests at all.
And that’s, that’s.
Essentially the way it is everywhere.
Are they doing tests?
Are they doing manual tests?
Are they sitting there running through their thing a hundred times manually?
Because if you are, you’re a developer, the first thing you should think is.
True, right?
I mean, that’s one of the, one of the virtues of the programmer, right?
What is it?
Laziness, confidence and hubris or something?
I forget.
Anyway.
Yes.
I suppose the point is.
For example, with you arguing to start out with CD, it makes it easy to be, to be lazy in a very productive way.
I say, why on earth should I do this twice?
I’ll do it manually once and then I’ll automate it and never worry about it again.
Exactly.
And it gives you a clearly defined boundaries for where should we automate things?
We know when we start writing code, we know when it’s deployed, everything in the middle, optimize that.
If you go the other way and you start, I’m writing code.
I’m going to start, I’m going to start automating things.
Until.
Someday I’m confident to deploy manual automatically.
You’ll never be confident.
Oh, wow.
This is so interesting.
I feel that we’ve given our listeners so much stuff to chew on.
Especially those of, of you, dear listeners who maybe come from a smaller company and have been wondering how to take all of those great ideas and, and make them work for you, especially given that you don’t really have all that many people to work with.
Right.
So hopefully Jonathan has given you some, some really good food for thought.
Um, so I’m wondering, Jonathan, I guess you do this sort of thing for a living if people, if people want to, um, find you and, and ask you to help them, how would they do that?
Where could they find you?
Good question.
My, my website is jhall.io.
J H A L L.
Just like my first.
Initial, last name.io.
And you can find me there.
Uh, if you’re interested, I have a daily mailing list.
Uh, I’d love to, to be in contact with you there, jhall.io slash daily and sign up and I’ll send you nice little funny stories about firing QA teams every now and then, uh, through your inbox.
Wonderful.
So Jonathan, once again, thanks so much for being on the show.
This was a really fun episode.
Thank you very much.
Thank you.
I love talking about this.
So excellent.
I enjoyed it.
So maybe you can come back another time and we can have round two of that.
Jonathan sounds great.
Thank you very much.
Goodbye.
Thank you.
Bye.
Bye.