Talks ∋ The long road to ruby 3.0 vs. the short road to ruby 3.1
The long road to ruby 3.0 vs. the short road to ruby 3.1
I am here to talk to you about how we at Cleo upgraded from ruby 2.7 to ruby 3.0. And that took a long time. And then how we very quickly then upgraded from ruby 3.0 to 3.1.
I’m Murray. I’m an engineering manager here. I didn’t really do any of this work but I couldn’t convince any of the engineers who did to be here on a Monday night, which is stupid because they could’ve got free beer and pizza!
What are we gonna talk about?
What are we actually going to talk about?
How we upgraded to ruby 3.0
- what we did
- how it went well
- how it went poorly
Then I’ll talk about how we upgraded to ruby 3.1
- how it went well
- how it went poorly
Then I will inspire you with some amazing messianic vision of what you should learn from this talk.
Who are we?
First up, who are we?
We are Cleo. We are building a financial AI assistant that understands your situation and gives you advice and coaching on how to get out of your financial situation. Because, probably, if you’ve downloaded one of these it’s not great. We have a fun and punchy tone of voice and visual theme that sets us apart from the rest of the financial advice apps out there.
We are mostly targeted in the USA. I say mostly, we’re entirely targeted in the USA at the moment. We hope to come back to the UK later but not right now. We are a start-up. We have limited … everything. So focus is key.
Background & Context
It is a ruby on rails backend, it’s a react-native frontend, a bunch of python doing machine learning in the backend of the backend. It’s backends all the way down I assume.
Some background and context about our app. I’m going to tell you how we did this upgrade and it’s important to know:
- what is our background and context
- what is our app
Because it may be different to your app and everything I’m saying is relevant to us – it may not be relevant to you. I think there are some general-purpose understandings that you can take from this, but this is not a one-size-fits-all thing.
We have a ruby on rails monolith. All our rails app is one app, there are no microservices in the mix in terms of the backend, it’s just one giant chunk of rails.
We also have a monorepo. The frontend react-native code also lives in the same repository. There is tooling there that is run using ruby, like cocoapods
and fastlane
, to help build the app and it runs with the same version of ruby.
We are in a multiple team situation. Depending on the timeframe we have 20 to 30 devs split across 5 to 8 teams. All working on this one monorepo, on this one monolith, all at the same time.
So that is some context about why we made the decisions of how we would do this stuff to our app.
Upgrading to ruby 3.0
How we upgraded to ruby 3.0
How did we get from here to here?
This the story of how we got from April 2021 where Cassie raised the draft PR for ruby 3.0. Basically “I’ve done some stuff, ruby 3 doesn’t work, how do we get this to a place where it does work?” to May 2022 and our final PR for ruby 3.0 that we merged and deployed and then we were running ruby 3.0.
You don’t need to try and read what’s on most of the slides but it’s just there for some flavour.
There’s a hint here as to how we did it: slowly, with lots of people involved.
April 2021
So let’s go into a bit of detail on Cassie’s first PR. Cassie raised this and already … it didn’t even boot. We had to comment out some gems that we relied on because they straight up wouldn’t easily just work with ruby 3. We unpinned some gems from our Gemfile
to say “just get whatever version actually works”.
So this PR got us to: the app not booting on ruby 3. Which is … good? … but we couldn’t ship it.
May 2022
So May 2022. It was merged two days after it was raised. All it was, was a bunch of changes in 14 places (but we won’t talk about that) to change 2.7.6
to 3.0.4
. We’d done all of the hard work so that this PR was just changing some numbers. Then we merged it, and shipped it and it was all good.
How did we get from a PR where the thing didn’t even boot to just a tiny set of changes?
Problem Gems
And that is what I’m gonna talk to you about.
The basic problem is gems. We use a lot of gems and we couldn’t be where we are without using a lot of gems. So we had to identify the ones that didn’t work and we had to break them down into three basic categories.
The first is the gems that just needed a simple version bump to say “oh this will now work with ruby 3”. Semver may or may not exist, but you have to review your dependencies. We couldn’t just bump it and be like “Yep! all fine!”, we wanted to review what was actually changing between version one that worked with ruby 2 and version two that worked with ruby 3. To make sure it wasn’t just a ruby s fix, it may have also included a whole bunch of extra stuff or breaking changes.
Then there were ones that required a more complex version bump. Yes, we had to do all that “review the semver changes”, but there were compilation problems or they had incompatibility with other gems. It was fine on the version we were on, but when we upgraded … oh that one also needs you to bump this one. So they were a bit more gnarly to deal with.
And then there were the ones that flat out did not work with ruby 3.0, but also there was no version … yet? (he said optimistically) that did work with ruby 3.0. So we had to decide what to do with with them.
We had a list of gems that didn’t work. We wrote down a list of them. We spoke about them in our general-purpose channels for devs and said “Can you investigate that one and upgrade it? And can you investigate that one and upgrade it?”.
- we had a list
- people decided to do stuff
- we started making progress
Eventually we got to a point where the app booted. That meant we could run our test suite. CI is running our test suite on ruby 3, it’s not good, but we have got some progress. It’s now running on ruby 3, albeit throwing a load of errors.
Problem Gems
Which is the secret fourth category of gem.
Which is the ones that there is no obvious version bump available to us but we don’t know we even need a version bump because it’s not until we run it that it, sometimes, when we go to certain code paths, tells us “I don’t work with ruby 3!”
And so that then meant CI gave us a massive list of Problems To Deal With.
Build output
So we put them in a spreadsheet!
Same idea as with the gems; we had a list of things, we put them in the spreadsheet. You don’t need to attempt to read this, but that is the complete list. There are 59 entries on there so it’s not huge. We have done some aggregation because obviously we have more than 59 tests in our suite and some of them throw out the same problem because of the same call-site being “you’re not doing this in the ruby 3 way”.
So we follow the same pattern, we have this list, we portion them out and said “hey you fix this one, you fix this one”, we slowly started chipping away at it so we could get to a green build.
ArgumentError: wrong number of arguments
And most of these are ArgumentError: wrong number of arguments
because ruby 3 changed how it handles that final hash-style keyboard argument. In ruby 2.7 if your last argument was a hash it would sort of explode it out and say “these are keyword arguments, yay!”. And in ruby 3 it says “I’m not doing that for you anymore, you’re probably trying to pass an actual hash” and we’re like “No! We’re relying on the old behaviour!”.
So most of the PRs that we raised were doing this – they were taking a final argument that is a hash and double splatting it to say “it’s not hash anymore, it’s keyword arguments”. It’s very boring work and there’s a lot of it.
Work lands on the main branch
What we realised by doing this is: all of this work could live on our main branch, because all of this work was still compatible with ruby 2.7. So instead of inflating this original PR to loads of gem upgrades and loads of API changes, we could land it all on our main branch which meant our original PR got smaller and smaller and smaller. And also we got fewer and fewer deprecation warnings about ruby 3 on main. We were able to see progress as we were doing stuff. And that’s enjoyable and helps you … make progress (I should have written this down).
What next?
The problem is, we just kind of fizzled out. This story takes us to about June / July 2021. We just sort of fizzled out a bit. We’d upgraded some germs, we’d fixed some of the ArgumentError
s, we were making progress but we didn’t finish it. As I said we are start-up, we have finite availability for doing things, and focus on:
- squad goals
- company OKRs
- shipping other interesting things are not going to take us forever to do
Those take priority.
And there was no one really running this, it was a “tragedy of the commons” – we were all responsible for this so fundamentally none of us were responsible for it. We were all like “oh we should make some progress on that ruby 3…”, but we didn’t. So we kind of just fizzled out on it, and that’s a shame.
A new year! A new you!
Thankfully 2022 rolls around and everyone got a bunch of excitement about “let’s make progress on those things we didn’t make progress on and be better”. Also ruby 3.1 came out and we’re now one point version behind what is legit. Rails 7 is saying: “we are going to support ruby 2.7 but rails 8 will come out in a years time and it won’t1”. So already, yeah, we need to get onto ruby 3 again. We got a bit more energised and excited about it but we learned from the past – if we just say will do it, we won’t do it.
So we formed what we call an action group!
Action Group
We use action groups quite a lot at Cleo. We have our squads working on squad work and action groups are what we use to solve that “tragedy of the commons” thing of a cross-cutting thing that will solve a problem for the whole team, but no single squad is going to be on the hook to actually do it.
For us an action group is something that has a singular focus. We’ve used it for:
- creating better on boarding experience for engineers
- a ruby 3 upgrade
- fixing an auth problem between the frontend and the backend
Things that help everyone but no one squad is in charge of. It’s also important that these things have a finish state. We know when they’re done, they’re not just going to run on forever.
It’s important that they’re regular. We commit to doing something on specific cadence. That could be every couple days, that could be every week, every two weeks, every month, whatever it is, but we commit to doing something. Your squad is not gonna miss you for one hour a week. Your squad is not going to miss you for one hour every two weeks. So you can always make some progress and we commit to being there. And we check in on what we’re doing. We make sure that … what did we say we were going to do, did we actually do it?
And the last thing is that they are contained. In the first PR that we raised we spoke about it in our general-purpose tech slack channel. This one we created its own slack channel for it, so it didn’t get lost in the thousands of things we say to each other every day. It is in a special place, so that if you want to know what’s going on with the ruby 3 upgrade you go to the ruby 3 upgrade slack channel. If you want to know what’s going on with the frontend auth problem you go to the frontend auth slack channel. They’re also smaller groups. There are maybe five people who say “I will commit to doing this” rather than 23 people who say “we should definitely do something about this” but no one actually does it. When there’s five people it’s a lot easier to be guilted into actually making progress.
#ruby-the-third
So “#ruby-the-third” is the name of our slack channel for getting us to ruby 3. Our goal is:
just get it done, let’s get onto ruby 3
It’s not a requirement for rails 7, but will make it easier. We were interested in exploring types, but we didn’t say “and also we should be using types as part of this” we just need to be running on ruby 3. That made it feel achievable.
We agreed to meet every fortnight and spend an hour. We decided that it would not be a talking meeting. We would check in on what we said we had done, but we’d very quickly break out into mobbing, or pairing, or splitting up into breakout groups who would do smaller versions of that. We would use that time to make progress, explicitly in the editor, on fixing our ruby 3 problem.
And this meant we were able to make constant progress. Every couple of weeks some PR landed that got us a little bit further towards our goal of shipping ruby 3.
Spreadsheet as WIP tracker
We really liked the idea of the spreadsheet from the first attempt. Pulling all the information and making it clear that if you hadn’t been to a session for a couple weeks you can just come along and ask “What’s not got a name against it? What’s not got a yes? We could do that this week,” which meant it was really easy.
- We pulled some data from the old spreadsheet
- We pulled some data from CI
- We pulled some data with extra information from CI and combination of stuff
- We wrote a little guide in the spreadsheet saying “you should go to CI and you should download this file and da-da-da-da
And that was a bit of a nightmare. So of course once you’ve written down manual steps someone can automate it…
Spreadsheet as editor
…and someone did! And they put the code into the spreadsheet.
Developers like to debate: vim! vscode! emacs! Real, real, developers code in Google spreadsheets. You cannot argue with Rubocop if there is no way of running Rubocop in your editor.
Process
We settled on a process for getting this stuff done:
- We checked the spreadsheet and picked a task
- We created a new branch from main
- We locally changed it to run ruby 3
- We did whatever was needed to complete that task
- We pushed it up to CI with ruby 3 change and made sure that passed
- We then took the ruby 3 change away and pushed it back up to CI to make sure it still passed on ruby 2.7
- Then we can stick that back into our main branch
- Then we rebase the ruby 3 branch
- And update the spreadsheet
This is boring but mechanical – you just do it. You’re just making progress. And it meant all the work is landing on the main branch which is ruby 2.7 and we’re getting there.
Some gotchas
There were some just gotchas about this process.
postgres
. We were using the postgres gem and using v1.1 which was not ruby 3 compatible. At the time we just did a bundle update
but v1.3, which was current, is not ruby 2.7 compatible. Luckily v1.2 also exists and worked on both, with a bunch of deprecation warnings, but that was enough for us to make progress. We’ll deal with those after, as a remember for us to have to upgrade to v1.3 once we shipped ruby 3 to get rid of the dep warning.
We use a gem call apipie
which is for helping us generate swagger docs so that our frontend colleagues can understand our APIs via a DSL. It didn’t work with ruby 3. It didn’t have a fix for ruby 3 at the time, but did have a PR that was open that said it fixed it. So we forked it, pointed our gemfile
at our fork, applied the fix and shipped it. Luckily by the time we actually merged ruby 3 there was a better fix that was released for apipie, so we just got rid of our fork and shipped that.
And then paperclip
which had been deprecated for a long time. Paperclip is a file upload solution. It had been deprecated for a long time in favour of rails’ own “you should use activestorage for this”. We always knew we would have to deal with that at some point, but file uploading is not an important part of our app so we never dedicated time to doing it. Luckily we didn’t have to dedicate time to doing at this time either! There is a community fork that says: “cool, you’ve deprecated it but maybe it could run on ruby 3 and still be deprecated?”. So we switched to that.
These three things point out the beauty of the rails and ruby gem ecosystem. There’s always someone else who has done the work, you just have to choose your risk. Which work do you choose?
- the community fork
- the unapplied fix
- the previous version that you hope doesn’t have a security problem
As long as you make a note of these things and signal them in your Gemfile
you’ll be alright.
and that’s kinda it? slow and steady progress
And so that’s kind of it. You make slow and steady progress. We were ready in about mid-April of this year – about a year after Cassie raised the original “could we use ruby 3?” PR. We had a completely green CI build on our ruby 3 branch.
Planning to deploy
Rather than just hitting merge, because we have continuous deployment, if we hit merge on a branch, it goes onto main, time passes, and then it’s on production – no human involved! We didn’t really want to just deploy it, just in case.
So we decided to do some more robust testing. We put the branch on one of our staging environments and told all the dev teams and their product owners: “Go and poke this. Poke your critical path flows. Give it a good checking over, make sure it works.”
We turned on the deprecation warnings in production so that ruby 2.7 would give us all of the information about all the ruby 3 things that we haven’t done yet. Because although that was on by default in ruby 2.7.0, it was off by default because it was too noisy in ruby 2.7.2 and we were running ruby 2.7.6, so we weren’t getting them by default. We turned them on and started tailing our logs. Luckily, because of all work we’d done, there weren’t that many so we didn’t have much more to fix.
And most importantly we let people know we were doing this. We said, “We are ready to deploy ruby 3, you probably want to install ruby 3, so that when we do it’s smooth for you!” We checked in with our frontend colleagues to make sure that all of their infrastructure still worked, because we were focused on getting the app booting, they have to build a native app using cocoapods
and fastlane
which is ruby infrastructure. That still needed to work for them, and as a bunch of backend nerds we didn’t really know how that works, so we just hoped it did. We got in touch and, luckily, we were right – it did just work! So we didn’t have to do anything!
May 2022
And so in May 2022, Ignacio raises a PR that is just number changes. He put co-authored-by
the 14 devs that had been involved throughout this whole year long process of doing it.
We ummed and ahhed about it and decided to just merge it.
We merged it.
CI deployed it.
Nothing broke.
This was a job well done by everyone involved!
Upgrading to ruby 3.1
So let’s talk about upgrading to ruby 3.1!
May 2022
It’s very similar. A couple of weeks later Murad raised a PR saying “What if we used ruby 3.1.2 now?”2
It’s a very similar story. There were some gems that meant we couldn’t boot the app because ruby 3.1 removed some stuff from the standard library. They are now gems and we had to put them into our gemfile
. There were a handful of gems that just needed another upgrade. There were a handful of test failures. So we did we did the same thing – we portioned it all out, we got everyone to fix those things, and a couple of days later that was all green. That had all been landed on main and we were ready to ship this.
So we did!
A couple of days later we just launched another version of ruby.
Unforch, that’s where the similarties end
Unfortunately that is where the similarities end; this did not go smoothly!
We didn’t tell anyone we are going to do it. All of our frontend colleagues asked, “Why am I being asked to install another version of ruby two weeks after the last one?”, we replied, “Oh, we merged another update. That’s cool right?”. Normally this would be fine, but we hadn’t told them, they weren’t prepped and it blocked them from doing some stuff.
Some builds failed locally because of macOS problems. The person who had done it had all the updates ready but other people didn’t. So we weren’t really aware, that, rmagick
specifically was a nightmare for some people to build locally.
Rubocop turned on some ruby 3.1 syntax cops. We didn’t really know that it was going to do that because on CI we run rubocop only on the diff from main. So our ruby 3.1 branch wasn’t going to see these, it was only as people touched files that had, what rubocop deemed unacceptable ruby 3.1 code in it, that it would start complaining. We had a big debate about whether or not we like the new valueless hash syntax. Someone turned it on by default, someone said “no!!!!”. We had a big debate about that.3
And our CI builds got slow because we weren’t using the ruby version as part of the cache key, and sometimes when you bump a ruby version you need to re-compile some of your gems if they’re native. Ruby is smart enough to know “I can’t use this one, I’ll recompile it when you bundle install
”. It was doing that every time because it wasn’t cacheing, so we had to sort that out so that we didn’t breach our SLAs on our build time on CI.
We just hadn’t really paid attention to this piece of work this time, and there were is lots of problems.
Postmortem
A a week later, the dust had settled:
- we had fixed ruby 3.1
- we had apologised to our colleagues
- we had fixed the local build problems
- we were still, I think, debating rubocop, but we were okay with that
We ran a postmortem on the process. We like running postmortems at Cleo. If there’s anything that we think, “hmm that could’ve gone a bit better,” we run a postmortem. We have a template, we run through it. Basically the idea is… no blame, but what went wrong and how do we learn from this and get better?
And so, fundamentally what came out of this was a bunch of minor quality of life fixes that might help specifics for this time round if they happened again:
- we would make sure rubocop didn’t run if you weren’t actually changing any ruby files so that our frontend colleagues weren’t getting a problem with it
- we put the ruby version into the cache key on CI, so that problem when away
But a major one was we decided that we would write an “upgrading ruby …” and then actually, what if you’re upgrading rails or what if you’re upgrading react-native … so “upgrading any major infrastructure” playbook with hints about what you should do, who you should talk to, how you should test it. For people doing future upgrades. But as I said, we are a start-up so we have deferred actually writing it until the next time we do one of these because there’s nothing like just-in-time compilation.
Inspirational reckons
So that got us to ruby 3.1.
Now I’m gonna give you some inspirational reckons about the point of this talk.
Be intentional about making progress
We were intentional about making progress. When we first tried to do this we were like “hey, it’d be nice if we used ruby 3,” and that petered out because we didn’t have a goal, we didn’t know who was doing anything. Whatever you are doing, if it’s not something that lives within a single team try and be intentional about it and get people together. It won’t just happen if people don’t show up. But don’t rely on people just showing up randomly for an ad-hoc or nebulous goal. Give them something they can make progress on.
Break big work up into lots of small work
Break your big work up into lots of small work that is easy to do. That’s what we did with our ruby 3 upgrade, it’s what we did with the ruby 3.1 upgrade as well. We landed stuff on main as regularly as we could.
If you’re working in any form of agile, this is what you’re doing. Someone doesn’t come to you and say “build me a hat delivery service!” and you go away for six months and come back with an amazing hat delivery service. You break it down into lots of tiny shippable things, so that months later you have a hat delivery service. Even when it’s not a product feature, think about how you can break things up into small, achievable, iterable things.
Treat small work like big work
Treat your small work like big work. Ruby 3.1 was a really easy upgrade for us. There was a handful of things to do, so we didn’t think it was going to be a problem. We didn’t worry about all the stuff that we worried about for ruby 3, because ruby 3 was A Big Scary Upgrade. Ruby 3.1 wasn’t so we just sort of YOLO-ed it and everything went wrong because we hadn’t thought about it properly. So treat your small work like a big work – go through your checklist to make sure you are Being Careful I guess.
Reflect on mistakes without blame
And last of all, reflect on your mistakes without blame. We ran a postmortem. We’re very big on no-blame culture because we need to know what went wrong in order for us to learn from it. When you reflect on what went wrong it has to be blame-free or your people won’t tell you what really happened and you will put in mitigations for things that won’t matter because that’s not what actually happened, it’s not the real problem. When people feel safe to tell you what really happened then you can really learn from it and it really stop it happening again. If you don’t do you’ll keeping making the same mistakes.
End
Those were my inspirational reckons, thank you for coming to this talk.
[thunderous applause unfortunately not captured by the microphone]
It’s finished.