In this episode, Luca talks to Amit Eliav of GetAI to get a feel for the kinds of changes that AI may bring to DevOps.
We explore both the technical and the human communication side of applying AI tools as they exist today.
In this episode, the guest, an expert in generative AI, delves into the integration and impact of artificial intelligence in the field of DevOps. The conversation covers various aspects such as the distinction between AI and automation, the history and development of AI, and its practical applications in current DevOps tasks. They discuss the advancements in AI tools like ChatGPT and their potential to automate and enhance various processes in software development and operations, while also addressing the challenges and limitations of AI in this field. The episode concludes with a discussion on future possibilities and the importance of understanding and effectively integrating AI into DevOps practices.
- The role and evolution of AI in DevOps
- Defining AI and its distinction from automation and data science
- Historical perspective on AI development and its mainstream adoption
- Current applications of AI in DevOps tasks
- Potential and limitations of AI in automating DevOps processes
- Future expectations and trends in AI technology
- Practical steps for integrating AI in organizational processes
- The importance of validating AI outputs in production
Amit’s website: https//:getai.consulting
Transkript (automatisiert erstellt, wer Fehler findet darf sie behalten)
Welcome to a new episode of DevOps auf die Ohren und ins Hirn.
Or in English, DevOps from your ear straight to your brain.
Usually this is a German-language podcast, but we sometimes make exceptions for international guests.
Today is one such exception.
Welcome to the show.
A few years ago, I researched generative AI at the university.
Before it was called generative AI and also built a generative AI venture.
Even before I built a system as a software engineer in the military, AI products for big companies.
So how long is it that big companies have been using AI then?
So back in 2018, I worked for Microsoft.
And I can attest to that.
That I was part of an AI team trying to improve their accommodation system, Xbox.
I’m not sure how long before companies have been using it.
And that was used in production.
So that wasn’t just a research project that Microsoft or something.
And it was not something new back then in 2018.
That brings me to a question that I’ve been itching to ask.
But I think we’ll take this question second.
And first, we’re going to honor the tradition of this podcast of having every guest.
Give us their own personal definition of DevOps.
So Amit, what’s yours?
So I thought about it.
And my definition of DevOps is doing the hard, back in the shadows, necessary work.
That doesn’t get the praise when it deserves it.
And when everything goes well.
But gets all the spotlight when things go wrong.
So if I’m hearing you say this, it feels like you’re talking very, very specifically about the production side.
You know, operation side of IT, is that right?
The question that I’ve been wanting to ask is to maybe actually come similar to the question we just answered, to come up with a definition of AI.
Like, what’s the difference between an AI and an automation?
Like, I suppose we can agree that Jenkins is not an AI, but where does AI begin?
And also where do maybe adjacent approaches live?
You know, what, what’s the difference between an AI and just a heuristic?
Or what’s the difference between AI and just data science?
Because like I, I imagine that what you did for, for this Xbox recommendation engine, you know, might have been just, just a regular statistics engine, right?
You find correlations, And you just spit them out.
Where does AI actually begin?
It began at, at the resistance level very, very long ago, but around 2012 neural nets became more in the mainstream.
Yes, they have become weak.
Not that, but they have become weak because, where does the AI actually begin?
Not really in the mainstream, but they got more powerful together with advances in computations.
I think AI started very long ago, in the 1980s or 1970s.
Yeah, but I didn’t mean when does AI begin in terms of time, but in terms of, I don’t know, complexity or what distinguishes an AI from a statistical model?
That’s a really funny question.
A few years ago, I read a book, Gedela Scherbach, which is a very old book that talks about AI.
And the definition there is very funny, actually.
AI definition changes all the time, because when computers are able to do something we back then thought was impossible, then this becomes AI for a few years.
But then people say, everyone can do that.
That’s not AI.
That’s just a computer vision.
Back then, Alan Turing defined AI, a program that can pass the Turing test, when you talk to a computer.
Arguably, Eliza could do that back in the 60s or 70s, I guess.
Yeah, they passed it long ago.
But in conclusion, AI definition changes all the time as per what’s the new development.
And what differentiates it from just machine learning?
Actually, every machine learning is an AI, but it’s not necessarily the other way around.
You can just write a very complex algorithm, and it can also be AI.
So what is the answer?
And the answer is, nobody knows?
Is that what it boils down to?
Or is it anything that looks like witchcraft?
It’s a good question.
AI is basically artificial intelligence.
So everything that you’d call that has intelligence and is artificial is AI.
And the definition of what’s really intelligent, it changes all the time.
Another question that I wrote down for you, and which I suppose I can already sense the answer to,
is what’s different about the AI craze that we see today now?
What’s different about it than the first AI craze of the 1970s?
And then the one in the 1980s, and then the one in the 1990s.
I can’t remember one in the 2000s, but that might just be me.
But it feels like everything.
In 10, 15 years, there’s a new AI craze.
So what’s different about the present one?
So about this, I think a few things are different.
The most obvious thing is that now it is very generally available.
The ChatGPT by OpenAI was, I think, the quickest product to get to 100 million users.
So it’s very generally available.
And it is very, very good.
It’s very powerful.
It inhabits knowledge, not like a real human, but close to it.
And you don’t need or have to be a pro data scientist to use it.
So people just started to tell each other, oh, I saw this new product.
It helps me do my work better.
And just the word spread like crazy.
So to summarize, you don’t need to be a pro.
You don’t need to be a pro to use it.
Yeah, I think that might be the big difference, right?
I imagine back in the, from what I read back in the 70s, only a handful of people could use it.
And probably it was able to do surprising things even back then.
But it never moved to the mainstream, right?
Because I suppose computers weren’t really mainstream back then.
Yeah, actually, the huge development was even a few years before ChatGPT.
By the model GPT.
But it didn’t really get to the mainstream because of the UI that exposed it.
ChatGPT really resonated with people because it is very intuitive.
It’s just a chat.
Everyone can use, just log in and interact with the language model that it is.
Yeah, that’s interesting, right?
That the power of an AI engine is not just, you know, influenced by the actual power of the neural network or whatever other thing is behind it.
How, as you said, how approachable it is for regular humans.
And whether you can actually ask it the right questions and then understand its answers.
And that’s one of, arguably, one of the biggest little genius of OpenAI.
I’ve never used AI in production.
And I’m really curious to talk through some approaches with you.
And so let’s just try to maybe think about ways in which you could.
Imagine AI might be usable presently in, you know, in DevOps work.
In DevOps work.
So what, maybe you can discuss it a little bit.
What do you think are the biggest pains in the DevOps work currently?
Well, of course, it depends on who you ask and it depends on the specifics.
But, you know, one thing that I find is really difficult to get right is to have a good visual.
And so it’s really a matter of human communication.
And I’m wondering in what ways AI would be helpful there.
Or another thing that I’m seeing is how to ensure continued high quality of our product.
So I’m thinking in terms of, you know, supporting production.
Or I’m thinking in terms of having a solid library of tests.
Or I’m thinking in terms of being able to discover value.
For our customers.
So it’s, I’m sorry, it’s quite broad.
But there must be something juicy in there where you can say, oh, yeah, this feels like a task that today’s AI could be helpful for, you know, for real cross-functional DevOps teams.
So the first thing you said, you could just build all the documentation of the organization.
And that includes even code.
Like Java code.
Data about onboarding presentations and history of Slack messages.
And have a bot that is the first responder to questions about the orgs data.
And what it does is just scanning through the orgs data everything.
And comes up with an answer.
If it’s good enough, you can just check that it’s done.
You don’t need a real human doing a machine’s job in that way.
And if it’s not good enough, so a human can step in and provide better support.
And the second thing you said is ensuring the continuous quality of the product, right?
What do you mean?
Like automating tests?
You know, could I use, quite specifically, could I use the AI to derive test cases from like user stories?
Yeah, you can do that.
And could I trust a present AI system to come up with valuable, useful and correct test cases?
Or would a human still need to double check?
In general, when generating code, it’s very recommended to use a human in the loop approach.
Because there will be always small misunderstandings.
And small errors a human would need to check and fix.
But it can definitely do the first go.
And let it do the boring work that’s possible to automate and add real value on top of the AI.
So I would feed it a user story.
And it would come back to me with a suite of tests that I would, I suppose, still need to fill in a couple of blanks.
A couple of misunderstandings.
Essentially only like a code review, right?
You know, I don’t actually have to write all of this myself.
I can read it through and say, yeah, this looks great.
This looks great.
Ah, you know, there’s something dodgy going on over there.
But only making corrections and not generating the test cases myself.
So I would expect it to generate some good test cases.
Some that would be not good.
And to push to production, like not the real direction at all.
And I’d expect you to generate some tests yourself or ask the ChatGPT to come up with more test cases that fits better what you need.
So I could even iterate with ChatGPT and reference the old test cases that it made and said, well, those are not cutting it.
Make better ones.
It’s quite good at that.
And to me, that is something that feels very important to enable ourselves to close those feedback loops quickly.
Because I feel like that is still something that is a struggle for many organizations to really, you know, not just build the forward direction,
but also ensure that tests get created early enough, in high enough volume, that they are run often enough.
So that’s very interesting.
Perhaps even more interesting than doing.
Ah, that’s an interesting idea.
Could you have an AI do an on-call work and have the AI respond if something, you know, if the monitoring fires an alert or something?
Yeah, actually I thought about it a little bit yesterday.
I wouldn’t trust it to be the only on-call guy.
I would currently trust it to ease the work of the…
Maybe surface real insights that says where the problem lies and a suggestion of what to do to fix it.
But I wouldn’t hire an AI and risk all the reputation of a company because maybe there will be a bug and maybe it won’t be 100% accurate.
Sometimes it just makes stuff up.
We all face it, right?
Oh, that’s, that’s interesting.
Talk more about this making stuff up.
Yeah, so how it’s trained is to produce output that looks good, but it does not have to be correct.
It’s really, really good at completing what you ask it to do.
So if you ask it to do something and it doesn’t know how to do it,
it might not tell you and just make stuff up that don’t exist.
Like we just, what we’ve seen, it makes up papers, names in science that don’t exist just to support its cases.
And its output looks good, but sometimes it just looks good and it’s just false.
And wouldn’t there be a way of having the AI be honest and say, well, I don’t know, instead of just making something up?
Yeah, you can ask it.
You can really, really stress.
Stress it to it.
It’s good because it’s really good at the following instructions.
And if you give it the possibility and it’s a good practice to always give it a way out,
like instead of asking it to classify a sentence between true or false,
give it also the neutral option when it’s not sure or tell it if you’re not sure, say, I don’t know.
I wouldn’t currently put an AI to do all the responsibilities.
I would just give it the ability of the on-call and just assist the…
And do you expect that AI will sort of remain in this assistive role or can you, do you expect AI to grow as far as actually replacing, for example, an on-call engineer for the ordinary on-call tasks?
It’s really interesting with the rise of AutoGPT.
I’ll recap quickly what it means.
AutoGPT is just an agent, an AI agent, where you send it to do something.
And give it some tools to do it, like search the web, search Wikipedia, use a calculator and chooses which skills or tools to use and comes back to you with a solution once it has it.
So in the future, I’m guessing yes, because it’s such a pain being on-call, right?
And waking up in the middle of the night, having to understand what’s wrong and saving the infrastructure of the company.
Basically, I think it’s an important enough problem expected to be solved.
One of the things that characterizes technological development is the exponential face of it.
Basically and generally, I think it’s very unintuitive how fast the future technology will pace.
Like we’ve all seen during COVID and understood a little better exponential math.
Now, all this technological advancement only makes…
the pace go faster and faster in a way that’s very unintuitive.
So to recap, yeah, I think in the future, it will be solved, unintuitively and before we know it.
So what I’m getting a sense of is that AIs do better, the better their training data is, the more regular maybe it is.
So I wonder if something like helpdesk work, that should be essentially…
a perfect task for an AI, right?
The typical set of problems is fairly limited.
The resolution is often the same.
There’s hopefully good documentation and so on.
Yeah, yeah, yeah.
That’s a very close problem.
And I think even today you can put an AI and companies are doing it already.
Like putting AI as a first responder, letting a human deal only with the more complicated,
How would I actually do that?
Like, let’s assume that I want to ease the burden of my helpdesk workers by applying an AI in production.
Is that a stage in a Jenkins pipeline?
How do I use an AI in this capacity?
So by helpdesk, what do you mean?
Like general questions about IT and…
Yeah, that sort of thing.
Oh, my printer is not working.
Oh, my password.
Maybe those are even too simplistic things, because like that that can literally be handled by an FAQ.
But yeah, sort of just your regular helpdesk type conversations.
So the ones that can be handled by an FAQ definitely can be solved today by just giving a summarized answer from the FAQ documentation from the organization.
I guess there are products that do that.
But you can also develop something for free on your data using a Python library called Langchain, which is a new library that takes all the organization knowledge, it can be anything textual, it indexes it in a way that it can query it very, very efficiently.
And once it receives a question from someone seeking support from the help desk, searches for the relevant part of the organization knowledge or documents, and it can just summarize the answer. And it’s really easy to use, very quick.
That brings me to an interesting, different thought. Like, maybe you’re familiar with the idea of the wall of confusion.
Okay, so that is a concept that is often used in DevOps too.
To describe what goes on between, let’s call it a traditional development team and a traditional operations team, where each has their own job, you know, the development team’s job is to write code.
The operations team’s job is it to take this code and have it run in production somehow.
The problem with this separation is that you get conflicts of interest, right?
The developers essentially are only concerned with churning out lots of curly brackets.
That’s a difference.
The operations guys, of course, say, well, we don’t actually care about new stuff.
Our role is it to guarantee stability.
So please don’t change anything.
Please don’t rock the boat.
That way we can guarantee stability.
And the whole premise of DevOps was to remove this conflict of interest.
And in fact, this sort of almost language barrier or certainly barrier of perception by saying, you know what?
You all together are responsible.
You are responsible for continuously delivering value to the customer.
So if, you know, if the system goes down, that’s not just operations problem.
That’s everybody’s problem.
And conversely, if we are too slow at iterating and creating new value, that’s not just developments problem.
That’s everybody’s problem.
And I was long introduction.
My question was, how could maybe AI and chat GPT really help overcome this language barrier?
So I was thinking, you know, could I, as maybe an ops engineer,
use chat GPT to help me figure out where in the code a particular bug might lurk?
So not give definitive answers, but something of the nature of, oh, yeah, look over there.
That’s, you know, that’s a likely candidate.
Is that something that would be doable?
I guess, you know, you could have the AI read the entire code base and then make assumptions about where something happens.
That’s absolutely doable.
I think Copilot X by GitHub.
Is going to do that.
So I don’t suggest implementing it from scratch, but feel free if you want.
You can, you can just ask a GPT or chat GPT to spot a bug in your code.
It comes up with good answers sometimes.
And yeah, it’s, it’s for sure doable, can be done.
It can even ease the burden of a code review and make, you know, linking rules in a code review.
So it can do, instead of just linking rules, it can spot bugs or ask for changes that are sometimes a little awkward to ask.
So you know that, that it’s a little awkward.
So you say, ah, as a code reviewer, I’ll pass that.
I already posted enough comments.
If a bot, just a bot, it doesn’t feel bad.
So it can ensure a better code for everybody.
Even better, better relationships.
Sometimes people take it personally.
Yeah, that’s, that’s true.
So how would I, I guess I interrupted you in answering that question before.
How would I actually practically do that?
How would I integrate an AI into my workflow?
Let’s call it AI Linter, just be a pipeline stage.
Sounds like it, doesn’t it?
Yeah, it can be, it can be.
If you, if you want to develop a custom solution, so it will be.
But I don’t recommend, I just recommend.
If you want to do it for fun, so go ahead.
But I think free, low cost solutions are going to be very high quality.
Do you expect them to stay low cost?
Because there is currently a race in the world.
Arguably last week, real AI breakthrough.
Because Google came out with their own good version of AI.
People say the iPhone revolution didn’t happen.
When the iPhone came out, but when Google came out with Android.
So if you can compare the two, right now is the iPhone revolution of AI.
Because now we have competition.
We have competition.
We have all the big companies trying to come out, come up with their own large language models.
And when there is big competitions with a lot of players, economy says,
the prices will go down to the price of the cost to serve those services.
And it will just become commodity.
And you expect that to happen quickly?
I don’t know how quickly, but I expect that to happen.
Yes, it is.
But I wonder if it’s simply cheap to sort of bait people into it and get them used to the idea.
And then eventually everybody will ratchet up the prices to actually be covering their cost.
Because it can’t be cheap to run all of those huge server farms.
But I really don’t know.
But, you know, since we are already looking at the crystal ball,
like, what do you expect to become possible in the near future?
And maybe also, what do you expect to never become possible?
I expect writing code will be easier and easier over time.
Right now, it already is quite easy.
People can go inside.
They have a project they’ve never seen before in a language, a code language they don’t know.
It’s possible for them to start delivering quite quickly, even today.
And I think that trend will continue.
I think AI will get into every aspect of our life where it can give us value and where we want it.
But that’s only a guess.
I really don’t stand really behind it.
It’s hard to predict the future.
Let’s go back to the present.
How would you practically go about building something like a GPT-based app?
If we’d want to build the kind of app that we discussed before with the knowledge base,
it’s really easier to discuss something concrete.
I’d set up, for example, a web page or a Slack bot.
If the organization is using Slack.
Serve up a backend using langchain Python.
And index all the organization data.
Let the Slack bot be the first responder to support requests.
And what should I expect in terms of resources that I need to set aside for such an AI?
Do I need my own server farm now?
Your own server for that?
You’d need your own server.
You’d need your own server to serve the backend.
It depends if you’re on a server or the Slack or Teams add-ons.
I don’t know.
But yeah, I’m thinking also in terms of how big of a support system do I need?
Do I need a single machine?
Do I need an entire farm?
Do I need…
Really small machine.
Because if you use, for example, OpenAI’s models as an API,
it doesn’t need to be strong.
All the embeddings of the knowledge of the organization, also from an API.
If you want to reduce costs a little bit and use open source software,
so you’d need bigger server, maybe with a GPU.
But it really depends on the case.
And then I just, you know, have this chat interface or something
that is included somewhere in my app.
That sounds deceptively easy.
What am I missing?
It really is easy.
But I think, you know, just thinking about this,
I can clearly see how you can be very valuable to companies
because you help me, for one, navigate all of this confusing stuff
and essentially giving me confidence that, yes, you can actually do this
and yes, it’s going to work.
That in itself is already worth a lot.
We believe in delivering value and bring this revolution
to the world.
To companies, to get the word out.
Let everyone enjoy this value.
So we talked about before about how to phrase your requests to chat GPT.
And I was wondering if you could maybe expand on that a little bit
and make it even more broad and concrete.
So, for instance, is it better to try to come up with a perfect question
right off the bat?
Or should I just start with something really,
really murky and then refine in conversation?
What’s the better approach?
And do I need to be worried about, like, leading the AI towards an answer
that might sound pleasing but is not actually helpful?
It really depends on the use case.
If you are going to use GPT in production,
you really need to make sure the prompt and the answer is really robust
because the answer that it’s going to give you,
you’re going to have to handle,
you’re going to have to handle every single question.
So, initially, it can give you every string that exists on Earth.
It can return you and quote Shakespeare.
And you have to make sure you can handle each and every possible output.
So, if you just need a yes-no question to be answered,
you have to make sure and test it on a really diverse set of examples.
So, you know…
So, you know what’s expected out of that prompt.
So, really make sure before you deploy.
And even when DevOps podcast,
you need to shift left and test your prompts earlier and often
before you write the code architecture
because it really depends on the outputs and what it can give you.
So, I encourage doing that first.
So, what you’re saying is that before I actually introduce it
or include it in my product,
I should sit down…
I should sit down and just play with the prompt a little bit
and see what kind of requests I ought to be making
and how the responses feel.
Is that what you’re saying?
Even more than that,
I suggest more than just eyeballing a few examples
in the OpenAI playground,
I say run it on hundreds of examples.
And you need to be very damn sure about the robustness of the prompt
before you deploy it into production.
Actually, I built a free tool for that.
You can put it in the comments.
That sounds like I need to actually do rather extensive automated tests
of my AI prompts.
Is that what you’re saying?
Yeah, I say you have to validate.
Could I even automate this?
Because it feels like I can…
Of course, I can replay a natural language prompt,
but how could I even parse the response without actually using my own AI?
So, at this point, I’m back to using a human to validate the AI
so that I can save effort for a human?
Yeah, that’s a good point,
but there actually is a solution to this.
You can use an AI to validate and to…
But the validation is not 100% accurate.
So, for example, the tool I published
lets you see all the outputs of hundreds of examples,
all the outputs from the prompts,
and lets you choose the engine that’s running the outputs.
And it also helps you to validate
that the outputs are correct in natural language.
So, the validation really can be not correct,
but it mostly is because you test it really good
that the validation is correct.
And from now on, you can just play with the prompt
and make sure it’s good.
You can do just a quick check to see validation is good.
There are, I guess, other tools out there,
but the tool, it just marks in red
the outputs that are not correct
and in green the ones that are correct.
And it just assists you in the job.
It’s not completely automated.
You need to validate the validator a little bit,
but it really is the job from experience.
Let’s think this through.
If I wanted to concretely add…
AI to, let’s say, to my product development process.
Let’s not even talk about the actual customer-facing product,
but just within something that is a little more controllable,
such as my own organization.
What do I even need and where should I start?
Do I need an account at OpenAI
or do I use a local AI implementation?
What’s the best starting point?
And it sounds like I need a validation tool
like the one you wrote to…
Help me hone.
What else do I need?
What’s sort of the minimal set of things
I should need to download,
create accounts for, and so on and so on?
Yeah, so the minimal is set up an account
in a company that provides a large language model,
such as OpenAI, and get an API key.
You can just use to run your prompts using an API,
provided, of course,
if you want to build a custom software for your organization.
We all know how to use JGPT’s UI, yeah.
But to use the API, you just need an account.
Okay, and that’s all I need.
I don’t actually need your validation tool then, do I?
Or do I still need it?
It’s not a must.
If you deploy…
If you use it in production,
I really encourage to validate the prompts
because, as I said before,
the range of outputs that comes out
might really influence the architecture of the code.
In the future.
So I encourage to shift left with this.
Is this something that a person can do in a day,
in a week, in a month?
It depends on the quality and all that,
but it can be a few days, a week, two weeks.
Okay, so really all I need is an API key for OpenAI.
Oh, by the way, that means that I give all of my company secrets
to this third party and to this AI,
and I suppose I can’t be sure that the AI,
which doesn’t know any better,
will just regurgitate it to others, will it?
I think there’s a story that Samsung accidentally revealed
a couple of their company secrets
through the use of an OpenAI, I think.
Yeah, I think they used the free version.
You, of course, need to…
This podcast will be here for a while,
so what I say now may change in the future,
Right now, using the API of OpenAI,
it won’t reveal private information,
but if you need to,
if you can’t share customer data outside of your cloud,
it’s also possible to deploy those models in your own cloud.
At the same quality as, say, OpenAI?
Yeah, you can deploy it on Azure,
which is powered by Microsoft,
49% of OpenAI.
That was a fascinating trip into a world
that I frankly don’t know very much about yet.
Before we close the podcast,
this is something that I forgot.
This is something that you wish I had asked you.
You wish you had asked me?
I’m not sure.
I can’t think of such a thing.
I guess that’s a good sign.
I mean, thank you so much for joining this show
and telling us about this fascinating,
If people want to learn more about you
and maybe make use of your services,
where could people find you?
You can check out our website,
Book an Appointment.
What’s the website?
That’s actually a pretty cool URL.
I guess we’ll get links to some of the products you mentioned
and stick them in the show notes.
As well as obviously a link to getai.consulting.
Thank you so much for being here.
This was very fascinating.
I wish you a wonderful day.
Thank you so much.