h-lame.com

TalksRe-interpreting data ∋ RubyConf 2023

Introduction

I gave this talk at RubyConf ’23 held in San Diego in November 2023.

I was proud of the version of this I gave at LRUG in January 2020, but I knew I had rushed the slides and was convinced there was a better version lurking in it. I decided to submit it for RubyConf 2023 and I was surprised and delighted to be selected. This was the push I needed to work on that better version. The major differences are that I expanded on the text heavy slides in the original by adding graphics to help explain the concepts, and I provided a slightly more coherent conclusion focussing on encouraging curiosity and fun while coding.

There is a video of this talk and as you can see from the video I was mostly reading from my presenter notes, but I did go “off script” occasionally. In this write-up I’ve combined the precision of my notes with the least egregious of my ad libs from the transcript.

Video

A video of this talk is available from Ruby Central:

a screenshot of the youtube video of the talk filmed by Ruby Central

Who am I?

text: Re-interpreting data, Murray Steele, Cleo, @hlame@ruby.social, RubyConf San Diego ’23

Hi,

I’m Murray, thanks for coming to my talk. I’m an Engineering Manager at Cleo. We’re based in the UK but our customers are in the US. We’re empowering people to build a life beyond a paycheck and we do that with an AI assistant that understands your banking information in order to give you personalised and relevant advice on your personal finances, and, for a fee, access to a range of services to actively help you improve your situation.

Buuut… I’m not talking about anything related to that, so if it sounds interesting, we are hiring and we can help with visas and relocation so, come and find me later. I have this face. You’ll find me.

What I am here to talk to you about is files and data and… well, let’s just get started.

A standard downloads folder

Screenshot of macos finder window showing some example filenames. text: Embarrassing Medial Procedure.ics; Endless Screaming.mp3; Favourite Meme.gif; Game of Thrones S08E06 (less disappointing edit).mp4; Hot Selfie.jpg; Secret World Domination Plans.rtf; Someone Else’s Intellectual Property.zip; Tax Return 2019 (evasion version).pdf

Here is a screenshot of a fairly standard downloads folder1. All these different types of files: pictures, movies, documents, audio. The file names include the title of the file and after the . what’s known as a file extension that tells you what kind of file it is.

In a modern graphical operating system, you don’t have to parse that extension yourself, your OS does it and gives you a handy icon to tell you what the file is, and maybe even give you a hint as to what application will be used to open the file if you double click it.

Renaming a file extension

Screenshot of macos file rename warning dialog. text: Are you sure you want to change the extension from “.pdf” to “.wav”?  If you make this change, your document may open in a different application. Keep .pdf; Use .wav

Indeed, if you rename the file to change the extension, this will very likely change the icon and the application that will open it. Your OS might even warn you about that.

When I first started using computers, I thought this was all that was involved: rename the file and you’ll be able to use it. Of course it’s not. When I tried renaming a file from .doc to .txt (because I didn’t have Microsoft Word at the time) I couldn’t open the file to read it in notepad – it was just a stream of nonsense.

I did try the example in the screenshot – I hoped that if I renamed a PDF to a WAV it would magically be able to read all the words in the PDF out to me. Obviously, it didn’t, it just gave me an error.

So there’s more to it than just the file extension; what’s going on?

The file command and how it works

man page for the file command

On unix systems there’s a command called file that if you give it a file, it will do its best to tell you what that file actually is.

Interestingly, one of the things it does is open the file and look at some of it to take a guess, it doesn’t just take the name of the file on blind faith and say that a .txt file is a text file, if it’s actually a zip file.

Why renaming files doesn’t work

Screenshot of macos file rename warning dialog. text: Are you sure you want to change the extension from “.pdf” to “.wav”?  If you make this change, your document may open in a different application. Keep .pdf; Use .wav

Going back to my youthful attempts to open files without the relevant (usually expensive) software, this explains why it didn’t work. You can call your file whatever you want, and the OS may use that for some hints as to what application will open the file, but it’s what’s actually inside the file that really matters. Renaming a .doc to a .txt won’t let you get at the words in the doc, renaming a PDF file to a WAV file won’t let you listen to the contents of that PDF file. I’m wiser now and understand my youthful folly, but…

The WAV file specification

Screenshot of a part of a webpage describing the WAV file format specification, including an image of the file structure; url: http://soundfile.sapp.org/doc/WaveFormat/

The WAV file specification

At some point in my early terminally online life I came across a website that described the data structure for WAV files.

WAV files, if you don’t know, are simple, uncompressed sound files, storing a digital representation of a recording of an actual sound. The 1s and 0s of the data are the numerical values that represent that sound wave – hence the name.

Exploring the WAV file format

A box representing a WAV file called

And they are very simple.

Exploring the WAV file format: The data part

A box representing a WAV file called

They’re made up of a header part, and a data part.

The data part is just that, raw bytes that represent the digital representation of the sound wave. Just a stream of numbers really. There’s lots of different ways to interpret those numbers, and that’s what the header part does. It tells us how to interpret the data.

Exploring the WAV file format: The header part

A box representing a WAV file called

The header part is split in two.

  1. the first part tells the world “Hi, I’m a WAV file and I’m this long,” – it’s very short and it’s there basically, to tell other software “if you don’t know what a WAV file is, you can stop now”2
  2. the second part tells the audio software how to interpret the data that follows. It describes things like:
    • how many channels the sound has (is it mono or stereo, etc),
    • how many samples there are per second for the sound,
    • how many bits there are per sample

How detailed this data is: more channels, more samples per second and more bits per sample means the sound is more accurate, but also means you need much more data to represent the same length of sound.

Could renaming work?

Screenshot of macos file rename warning dialog. text: Are you sure you want to change the extension from “.pdf” to “.wav”?  If you make this change, your document may open in a different application. Keep .pdf; Use .wav

Ok. So. We can’t rename a file from tax return.pdf to tax return.wav and expect to be able to listen to it.

But, given how simple the WAV file format is, we could take a PDF file, and put a WAV file header on top of it and then we can listen to it.

How to convert PDF to WAV

How?

Well, a WAV file is header + data. The data part is easy, we just take the entire contents of the PDF and smoosh it onto the bottom of our WAV file.

The header is more complicated, but not by much. We can calculate what we need for the first part of the header just by looking at the size of our source file. By making some choices about the sample rate and bits per sample we can calculate the second half of the header.

Creating the WAV header in ruby

Snippet of code needed to create a WAV header in ruby. source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L76-L98

Source for code in slide

Here’s some ruby code that constructs the header. Let’s go through it.

Creating the WAV header in ruby: identifier & length

Snippet of code needed to create the identifier & length part of the WAV header in ruby. source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L76-L83

Source for code in slide

The 1st stanza builds whole of the identifier & length part of the header, e.g. “I’m a WAV file, and I’m this big, including the size of the header”. That’s the magic 36.

Creating the WAV header in ruby: data format details

Snippet of code needed to create the data format details part of the WAV header in ruby. source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L85-L92

Source for code in slide

The 2nd stanza constructs the first part of the data format details header. Our arbitrary choices are in some instance variables (e.g. what the sample rate and bits per sample will be) and we combine those with the some calculations on file size to explain how to interpret our data. There are some magic numbers in here too – but trust me, while they’re important they’re also boring.

Creating the WAV header in ruby: final data size

Snippet of code needed to create the final data size part of the WAV header in ruby. source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L94-L96

Source for code in slide

The 3rd stanza constructs the second part of the data format details header which again uses the file size to explain how much data there is.

As we saw, after this header we just have the raw bytes that make up the actual sound data.

What’s missing from this code snippet is how you actually copy the data around between files and write the header to a file. I’m sure you could all imagine that, so I’m not going to show it.

Using Array#pack

Snippet of code needed to create a WAV header in ruby, highlighting the calls to the `pack` method.  source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L82 & https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L92 & https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/waves.rb#L96

Source for highlighted code in slide: 1, 2, 3

All that’s really interesting about this code is the pack method. I don’t know about you, but in my day-to-day coding life I’d never encountered it before, so here’s what it does:

Called on an array of numbers, and passed a format string, pack will convert those numbers into bytes. Why’s that interesting? Aren’t numbers bytes anyway? Well, as it turns out, there’s lots of different ways to represent a number in bytes and pack lets you control that. So you can say: represent this number as a 4 byte number, or a 2 byte big endian number, or a 2 byte signed integer. You don’t need to know what those words mean but trust me it’s important when you’re thinking about bytes.

What a WAV file cares about is that some of the header is a 4 byte number, some is a 2 byte number, etc…. That’s what the pack statements are doing. It’s a V for 4 byte little-endian and it’s a v for 2 byte little-endian3.

And, that’s basically all there is to it.

An aside on rubocop-magic_numbers

text: Aside: https://github.com/meetcleo/rubocop-magic_numbers

https://github.com/meetcleo/rubocop-magic_numbers

As an aside: that code had a lot of magic numbers in it. At work, we’ve got a custom rubocop rule (published as a gem) that shouts at us if we have any magic numbers in our code and says to define named constants for them. But this code is personal and for fun, so best practices be damn’d, right?

That said… coming back to this code I did wonder what that 16 was for. Is the 16 special because we’re talking about bytes and it’s a multiple of 8, or is it something about the WAV file format? I do not remember, sorry! If only I’d given it a name maybe I would. So, maybe, best practices are good, actually?4

WAV Demo

text: WAV Demo

Anyway, we’ve seen all the code, so when don’t we try it out!

Demo time: WAV (pt. 1)

irb(main):001:0> require './lib/stegosaurus' => true irb(main):002:0> w = Stegosaurus.waves => #<Stegosaurus::Waves:0x000000010a2d1788 @bps=8, @buffer_size=128, @channels=:mono, @sample_rate=22050> irb(main):003:0> f = w.make_from './README' => "artefacts/README.wav" irb(main):004:0> Stegosaurus.open_wave f => ""

First we require the library,

It’s called stegosaurus because I thought this code might be a steganography tool (for hiding data in other data) and, also, dinosaurs are cool!

We’ve got a library called stegosaurus and we can grab an object from it. It’s a waves object because we’re building WAV files.

Then we call make_from to make a WAV from a file…

Oh, we’ll need a file. I have a README in the repo that’s 1,000 bytes or so and that should give us something interesting. Let’s use that.

That gives us the filename for our new WAV file which I’ll open in VLC with a convenient little helper method.

The stegosaurus README as a WAV file.5

Okay, it’s very short. As I said the WAV format needs a lot of data to let you hear something, so we’re going to need a bigger file.

Demo time: WAV (pt. 2)

irb(main):005:0> f = w.make_from RbConfig.ruby => "artefacts/ruby.wav" irb(main):006:0> Stegosaurus.open_wave f => ""

There are probably bigger files lying around, but the big file I know that I definitely have is the ruby interpreter itself.

Ruby comes with a nice little module called RbConfig that gives us lots of details about how ruby was built. The important method that I care about is RbConfig.ruby which gives you the path to the currently running ruby interpreter!6

If we stick that into our library we’ll be able to hear the ruby interpreter itself as a WAV file.

I hope you’re ready for this!

The ruby 3.2.2 interpreter as a WAV file.7

It’s basically unlistenable white noise, right? Fair, I mean, what did we expect? Although… if you’re old and Dusty like me, you can thank me later for the nostalgia trip you just had about loading software from tapes or connecting to the internet over dialup.

What is kind of interesting is that as we skip through it there’s some structure: different parts of the file sound different. You definitely don’t want to listen to it, but it’s interesting that there is some structure to it.

Are WAVs the only way?

text: that was audio, what about visuals?

We explored the WAV file a bit and there’s some structure, that’s interesting right?

I’m not listening to all that white noise to get attuned to the differences though. What if there was another way to explore the structure of the file?

A visual way maybe?

If we can listen to our files as WAVs, is there a similar shaped file format to let us look at them too?

The BMP file specification

Screenshot of a part of the webpage describing the BMP file format specification, including an image of the file structure; url: https://paulbourke.net/dataformats/bmp/

The BMP file specification

Yes! There’s the BMP image format (Bitmap). It’s got a header and then pixel data, so we should be able to do basically the same as we did with WAV files: calculate the header and write out our source file as the pixel data.

Exploring the BMP file format

A box representing a WAV file called

Lets look at BMPs like we did WAVs

Exploring the BMP file format – the data part

A box representing a BMP file called

They’re made of a header and data.

The data, as you might expect, is the pixel data. It’s colour values for each pixel in the image.

Exploring the BMP file format – the header part

A box representing a BMP file called

The header can be split into 3 parts:

  1. An opening segment with the BMP identifier and the length of the file
  2. A second segment with information about the image: width, height, colour depth, DPI resolution (e.g. if you want to print the bitmap out this scales pixels to inches), etc…
  3. A final segment that describes the colours used in the image. BMP is an indexed colour format; the pixels are not red, green, blue values they’re a number that points to an entry in the colour table.

It’s nothing we’ve not done before with WAV files. Except the pixel data part is a little more complex. We can’t just throw all the data at the end of the header and expect it to work. There’s 3 problems we have to solve.

BMP pixel data problems: 1. Colour Depths – 1-bit

Showing how many pixels a 25 byte file would create with a 1-bit colour depth: 200; text: 1. Colour depths; 1-bit: 1 byte = 8 pixels; 25 byte file; 200 pixel image

The first is colour depth (e.g. how many colours the image has). One of the arbitrary choices we make is to choose a colour depth for our image and this has an impact on the amount of data we need. If we want a monochrome image we choose 1-bit colour depth and each byte of our file is equal to 8 whole pixels.

So a 25 byte file becomes a 200 pixel image. Nice.

BMP pixel data problems: 1. Colour Depths – 8-bit

Showing how many pixels a 25 byte file would create with a 8-bit colour depth: 25; text: 1. Colour depths; 8-bit: 1 byte = 1 pixel; 25 byte file; 25 pixel image

If we want more colour (256 to be exact – shout out to VGA) we can choose 8-bit colour where 1 byte = 1 pixel.

Our 25 byte file is a 25 pixel image.

BMP pixel data problems: 1. Colour Depths – 24-bit

Showing how many pixels a 25 byte file would create with a 24-bit colour depth: 8⅓; text: 1. Colour depths; 24-bit: 1 byte = ⅓ pixel; 25 byte file; 8⅓ pixel image

What if we want even more colours? We could choose 24-bit colour (AKA: true colour – 16 million colours, your eyes will not be able to cope with that!). Our 25 byte file is now a glorious 8 and ⅓ pixels. If we don’t have a complete pixel, a BMP renderer is just not going to show the image 😞

Oh, what do we do?

BMP pixel data problems: 1. Colour Depths – null to the rescue!

Showing how to add `null` bytes to a 25 byte to complete the final 24-bit pixel to create complete image; text: 1. null to the rescue; 25 byte file + 2 null bytes = 9 pixel image

Pretty easy – unlike almost any other programming problem, it’s null to the rescue! We just add null bytes to the end of the data!

We can work out how many whole pixels the data would create and how many padding bytes we need to complete the last pixel – it’s a function of file size and colour depth.

Problem solved!

BMP pixel data problems: 2. Width x Height – simple squares

Showing how we can arrange the 9 pixels of a 25 byte file into a rectangle; text: 2. Width x Height; 25 byte file (+ 2 padding bytes) = 9 pixels

Our second problem is one of rectangles – width and height. Images have to be rectangles, so we need to arrange our pixels into rectangles.

Our padding example is really simple. We’ve got a 25 byte file which we pad to 9 pixels.

9 pixels is a lovely 3x3 square. Easy.

BMP pixel data problems: 2. Width x Height – annoying rectangles

It’s not always that simple. What if it was a 28 byte file? That gives us 10 pixels.

Yes…

…you can re-arrange that into a 5x2 rectangle.

Buuut. Factoring huge file sizes to find convenient rectangles, that will also be reasonable to look at on a screen might be painful. We don’t want wide and short or tall and skinny images. Simplest thing is just to work with squares.

BMP pixel data problems: 2. Width x Height – null to the rescue!

Showing how to calculate how many `null` pixels you need to add to a 10 pixel file to create a square image of 16 pixels; text: 2. null to the rescue!; 28 bytes file (+ 2 padding bytes) = 10 pixels; ⌈√10⌉ = 4 pixel square = 16 pixels; 16 - 10 = 6 padding pixels needed; 10 pixels + 6 padding pixels = 16 pixel image

So it is null to the rescue again. Hurrah!

We have a simple “algorithm” for calculating the smallest square that can contain all our pixels.

  1. Get the number of pixels – 10
  2. Take the square root – 3.something
  3. Round that up – 4
  4. Square it – 16
  5. Subtract the number of pixels you have – 16 minus 10 = 6

Now we know how many padding pixels to stick onto the bottom of our image to make a square.

It might be inefficient in terms of extra bytes, but it is simple as it only uses two or three methods from Math.

BMP pixel data problems: 3. Scan lines – a valid pixel per row count

An example of BMP scan lines for a 16 pixel image – 4 rows of 4 pixels each – 4 pixels is 12 bytes (3 bytes per pixel); text: 3. Scan lines; 28 byte file (+ 2 padding bytes + 6 padding pixels) = 16 pixels; 16 pixels; 4 scan lines; 4 pixels = 12 bytes

Our 3rd problem is scan lines. You could probably have anticipated the first two if you thought about it, but scan lines are a quirk of the BMP format.

A scan line is a single row of pixels from our image. Our 28 byte file is a 16 pixel image which is 4 by 4 – so it has 4 scan lines containing 4 pixels each.

For reasons the BMP spec says a scan line must be a multiple of 4 bytes

This is fine for our 28 byte file, because our rows are 4 pixels long, each pixel is 3 bytes, and that adds up to 12 which is a multiple of 4.

BMP pixel data problems: 3. Scan lines – an invalid pixel per row count

But, if we go back to our 25 byte file, that’s 9 pixels, a 3x3 square, which is 9 bytes per scan line…

…which is not a multiple of 4.

BMP pixel data problems: 3. Scan lines – null to the rescue!

So, happily, it is also null to the rescue again.

We could rearrange the pixel data to make scan lines that are multiples of 4 bytes long and then pad the end of the file with null bytes to complete the square. This’ll work.

We will get valid scan lines.

BMP pixel data problems: 3. Scan lines – null to the rescue?

However, I kinda think we’ve wasted some of our data.

If we look at it as bytes and pixels you’ll see what I mean.

When I rearranged the pixels into valid scan lines some of those “pixels” aren’t pixels anymore. They’re not visible – they’re just there to appease the scan line rule. We can’t see those pixels.

If I apply a “visible” overlay it’ll be clearer.

It annoys me that 6 whole bytes of my file are stuck at the end of scan lines where I can’t see them and they’re not showing me the structure of the file.

An important rule

text: An important rule: don’t waste source file bytes

I gave myself a self-imposed rule: I didn’t want to waste any source file bytes. It’s my project, and I can do what I want, even if it creates problems to solve. Which, I guess, that’s why we’re all here: to solve problems with programming even if they’re self-imposed and silly. I want to use as much data from my file as possible to make my image so we’re going to have to rethink.

BMP pixel data problems: 3. Scan lines – null to the rescue!

Instead of adding null bytes to the end of the file, what if we add null bytes to the end of each scan line?

This gives me valid scan lines of 12 bytes, which is good.

And without wasting any source data, which is also good.

That’s that problem solved too.

BMP pixel data problems: Total padding – pt. 1

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – stage 1 – the 17 byte file; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file

To recap here’s all the padding we need for a single file.

24 bit colour, so that’s 3 bytes per pixel.

Let’s say we have a 17 byte file…

BMP pixel data problems: Total padding – pt. 2

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – arranging the bytes into groups of 3; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file

…arranged as groups of 3 bytes…

BMP pixel data problems: Total padding – pt. 3

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – highlighting complete groups of 3 bytes as pixels; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file

…this gives us 5 complete pixels and 2 bytes left over.

BMP pixel data problems: Total padding – pt. 4

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – completing the last group of 3 bytes by adding a `null` byte; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte

So we add 1 null byte to complete the last pixel…

BMP pixel data problems: Total padding – pt. 5

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – highlighting the now complete last group of bytes as a pixel; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte

…giving us 6 pixels.

BMP pixel data problems: Total padding – pt. 6

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – highlighting that the 6 pixels are not a square; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte + 9 bytes (3 pixels) + 9 bytes (3 scan lines)

That’s not a square, annoyingly…

BMP pixel data problems: Total padding – pt. 7

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – adding 3 `null` pixels to make a 3x3 square of pixels; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte

…so we add 3 null pixels to get a 9 pixel square.

BMP pixel data problems: Total padding – pt. 8

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – showing that each `null` pixel is made of 3 `null` bytes; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte + 9 bytes (3 pixels)

These new pixels are made up of 3 null bytes each, so we’re adding another 9 bytes.

BMP pixel data problems: Total padding – pt. 9

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – showing the 3x3 pixels as incomplete scan lines; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte + 9 bytes (3 pixels)

Our 9 pixel square image means we have 3 rows of 3 pixels each, which is 9 bytes…

BMP pixel data problems: Total padding – pt. 10

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – adding 3 `null` bytes to each incomplete scan line; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte + 9 bytes (3 pixels) + 9 bytes

…so we add 3 bytes per line to get to our multiple of 4.

BMP pixel data problems: Total padding – pt. 11

A worked example of the total padding needed for 24-bit colour depth with a 17 byte file – showing the rows of pixels + `null` bytes as complete scan lines; text: Total Padding; 24-bit colour with 17 byte source file; 17 byte file + 1 byte + 9 bytes (3 pixels) + 9 bytes (3 scan lines)

This completes our scan lines.

Total padding – 19 bytes. Yes, this is more than the original source file which is inefficient, but this is an extreme example given the small input file size. You probably don’t have any 17 byte files on your computer these days – they’re all huge, right?

What’s interesting about this is we can see that pixel + rectangle padding just go onto the end of the source file data, but line padding has to go inside the source file data at the end of each line. It’s interleaved with the source file data.

Writing BMP source data as pixels

Snippet of code for writing BMP pixel data by interleaving source bytes and scan line padding bytes.  Source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/bumps.rb#L252-L266

Source for code in slide

I said the code for WAV for writing the source data was uninteresting, but it’s not for BMP! Here’s a snippet of it.

We pull bytes from the source file based on how wide the image will be in pixels and we write them to the target file, and then we add the scan line padding bytes. And we repeat this until we’ve exhausted the source file.

Using Array#pack to get null bytes

Snippet of code for writing BMP pixel data by interleaving source bytes and scan line padding bytes, highlighting the call to the `pack` method.  Source: https://github.com/h-lame/stegosaurus/blob/24ec34dff57062ac9edd075163d1c9b8c2c26d08/lib/stegosaurus/bumps.rb#L253

Source for code in slide

What’s most interesting here is the top line where we construct the scan line padding.

Our friend pack is back, but we’re packing from an empty array? What’s that going on there?

Well, it turns out, if you want null bytes you can create an array of the right number of 0s and pack with the appropriate format string depending on how many bytes you want a 0 to take up, or you can use an x in your format string. If you follow that x with a number, pack will generate that many null bytes. Neither an x nor an x<somenumber> in your format string will use up any of the numbers in your array.

Here we only want null bytes so we call pack on an empty array. We’re getting something from nothing, that’s neat!

An aside on pack and Idiosyncratic Ruby

text: Aside: https://idiosyncratic-ruby.com/4-what-the-pack.html

As another aside: there’s a whole lot more that pack can do, but spoilers this was our last outing in this talk. To learn more I recommend this post on Jan Lelis’ Idiosyncratic Ruby blog. The whole blog is great though and you should go read it all.

I don’t think you’re in the room, but thanks Jan!

All the BMP generator code

Source for code in slide not that you were really meant to read all that code

So, there’s lots more code in this one – but mostly it’s uninteresting.

As you can imagine, the code for dealing with all the other padding we calculated is a lot of maths, but you can probably imagine it. There’s also code to generate a colour table as 4-byte rgb(a) tuples8.

Although, a critical reading of this code would be interesting:

  • Why have I used so many while loops?
  • Why isn’t bytes_from more idiomatic and iterator based?

I don’t know… sigh …and we’re not going to find out.

BMP Demo

text: BMP Demo

To save me from critical self reflection on that code – let’s have a demo and see what our data looks like!

Demo time: BMP (pt. 1)

irb(main):007:0> b = Stegosaurus.bumps => #<Stegosaurus::Bumps:0x0000000106443de0 @colour_depth=8> irb(main):008:0> f = b.make_from './README' => "artefacts/README.bmp" irb(main):009:0> Stegosaurus.open_bump f => ""

So, as before, we going to grab a generator object, this time called bumps.

We will do the same thing again and generate something from the README, and I’ll open that in an image editing app with another convenience method.

The stegosaurus README as a BMP, albeit scaled up somewhat.9

It’s tiny, but let’s look inside. The interesting thing you can see here is at the top, the colour stops halfway through. Those are our null bytes, they’re all at the top because… BMP files are upside down? That’s wild! Thanks BMP people!10

But, like WAV, you don’t get much from a short README file. Or do you?

Demo time: BMP (pt. 2)

irb(main):010:0> Stegosaurus.open_bump Stegosaurus.bumps(1).make_from ‘./README’ => ""

One of the arbitrary choices we can make is the bit count for the colour depth. Why don’t we generate a 1-bit version? That is maybe more interesting:

The stegosaurus README as a 1-bit BMP.11

If we zoom in on it you can see there’s a bit more structure here. You could probably learn to read this, just like that person in the first Matrix movie and then this could be how you operate computers. That would be fun!

Demo time: BMP (pt. 3)

irb(main):011:0> Stegosaurus.open_bump b.make_from RbConfig.ruby => ""

For completeness though, let’s see what Ruby looks like as a BMP and if we can learn anything about the structure of the interpreter from the visuals.12

The ruby 3.2.2 interpreter as a BMP, albeit scaled down somewhat13.

Here’s ruby the BMP – what can we learn from this?

Well, um, parts of it are a pinkish hue, which is cool because it’s ruby, so that’s good. A surprising amount of it is this iridescent green which, as we all know there was some Perl influence on Ruby as Matz shared this morning, so that’s probably what that is. Then there’s this sort of scary dark bit in the top third, and I guess that’s maybe where the exceptions live.

I mean, okay, I can’t interpret this, but we can look at this and we can make up stories to explain what we don’t understand, just like our ancestors! That’s fun!

Are WAVs and BMPs the only way?

text: WAV was annoying to listen to though

We’ve now looked at some BMPs and seen some things, but the thing about that is it was sort of a diversion from my original plan: I wanted to hear files. While WAVs worked, even the most hardened glitchcore music fan probably wouldn’t enjoy listening to them for less than a second.

Luckily, there’s another file format to play around with…

The MIDI file specification

Screenshot of a part of the webpage describing the MIDI file format specification; url: http://www.music.mcgill.ca/~ich/classes/mumt306/StandardMIDIfileformat.html

The MIDI file specification

MIDI!

MIDI stands for Musical Instrument Digital Interface and the file format is part of a standard for communicating with actual hardware instruments like synthesisers and such like. The part that’s interesting to me is that as a file format instead of storing the actual recorded sound data like WAV, it’s more akin to sheet music, and, as a format it’s header + data based so we can work our magic with it, too.

Exploring the MIDI file format

A box representing a MIDI file called

So, what does a MIDI file look like?

Exploring the MIDI file format – header & data

A box representing a MIDI file called

It’s the header and data we know and love.

Exploring the MIDI file format – the header part

A box representing a MIDI file called

The header contains two parts:

  1. an identifier that says “I’m a MIDI file” and explains some details about what type of MIDI file it is. Things like things about time signatures, etc…
  2. a track header that explains the size of the track data to come.

The track data however isn’t like WAV, we can’t just smoosh the source file under the header and we’re done.

Nor is it like BMP where we have to interpret and interleave some padding.

It’s much more structured and is made up of a stream of MIDI Events.

Exploring the MIDI file format – a MIDI event

A box representing a MIDI event; text: A MIDI event

A MIDI event is structured…

Exploring the MIDI file format – a MIDI event – time & data

A box representing a MIDI event. It has been split in two: the delta time part at the top, the event data part at the bottom; text: A MIDI event; Delta time part; Event data part

It’s made up of a time and some data:

  1. delta time – this says when the event should occur. It’s known as the delta-time because it’s the time since the previous event rather than a fixed “at this point in the song” value. This can be between 1 and 4 bytes.
  2. Event data part – this says what the event actually is and can also be broken into two parts.

Exploring the MIDI file format – a MIDI event – event type & data

A box representing a MIDI event. It has been split in two: the delta time part at the top, the event data part at the bottom; The event data part has also been split in two: the event type at the top, the event data at the bottom; text: A MIDI event; Delta time part; Event type; Event data
  1. Type – there’s a list of event types, for example play a note, set the tempo, do something to the hardware, etc…. This is always 1 byte.
  2. Data – this contains extra data depending on the event type. For example, a “play a note” event needs data on which note to play. This is 1 or 2 bytes depending on the type of the event.

Let’s look at these 3 parts in more detail.

MIDI Event structure: 1. Delta time

text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value

Delta time is stored as a variable length value of between 1 and 4 bytes. It’s a space saving technique. A piece of music could have, oh I don’t know, as many as 200 notes in it? If you are always storing 4 bytes for the time that’d be 800 bytes. Ostentatious! For most notes you probably don’t need a 4 byte value to say “play this note really soon after the last one”, so if you could store that value in 1 byte numbers, you could save 600 bytes! That’s useful!

How this is implemented is that a single byte is split into two parts: 7 bits are used to store the value, and the remaining 1 bit is used to say if the next byte contains more data for the value or not.

  • 0 means no more data, you’ve got all the information you need
  • 1 means read the next byte and find out if you need more

Values less than 128 can be stored in 1 byte, values more than 128 are stored in multiple bytes. In theory this allows for infinitely large values – we can keep setting the “more” bit to 1, but in practice, the MIDI spec says 4 bytes is the maximum. This gives us 28 bits to store a value, allowing us to store values from 0 to 268,435,456. This should be enough for even the longest of long songs.

MIDI Event structure: 1. Delta time – 127

text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; Value

Here’s a worked example:

A value like 127…

MIDI Event structure: 1. Delta time – 127 – bit encoding

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; Value; Bit encoding

…is encoded as bits in 1 byte as a 0 followed by seven 1s.

MIDI Event structure: 1. Delta time – 127 – VLQ encoding

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; Value; Bit encoding; VLQ encoding

To encode in VLQ, it’s the same! But that first 0 isn’t a padding 0 we don’t need, it’s actually a status bit saying:

this value is complete, you don’t need to read another byte.

MIDI Event structure: 1. Delta time – 128

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; Value; Bit encoding; VLQ encoding

A value like 128…

MIDI Event structure: 1. Delta time – 128 – bit encoding

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown in standard encoding using 8 bits: 1 0 0 0 0 0 0 0; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; 10000000; Value; Bit encoding; VLQ encoding

…is encoded as bits in 1 byte as a 1 followed by seven 0s.

MIDI Event structure: 1. Delta time – 128 – VLQ encoding

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown in standard encoding using 8 bits: 1 0 0 0 0 0 0 0 and in VLQ encoding using 16 bits: 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 – the first bit is highlighted to indicate it is status, not data; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; 10000000; 10000001 00000000; Value; Bit encoding; VLQ encoding

To encode in VLQ we need two bytes. The first byte is a 1, followed by six 0s then a 1. That first 1 says:

read another byte.

That second byte is eight 0s. The first 0 of which is the status bit saying:

the value is complete and you don’t have to read another byte.

So how does this turn back into a value of 128?

MIDI Event structure: 1. Delta time – 128 – VLQ decoding – no status bits

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown in standard encoding using 8 bits: 1 0 0 0 0 0 0 0 and in VLQ encoding using 16 bits: 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 – the first bit is highlighted to indicate it is status, not data.  The non status bits of the VLQ encoding of 128 are collected together: 0 0 0 0 0 0 1 0 0 0 0 0 0 0; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; 10000000; 10000001 00000000; 00000010000000; Value; Bit encoding; VLQ encoding

Well, we can drop the two status bits as they’re uninteresting. That gives us six 0s, a 1, then seven 0s.

MIDI Event structure: 1. Delta time – 128 – VLQ decoding – no leading zeros

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown in standard encoding using 8 bits: 1 0 0 0 0 0 0 0 and in VLQ encoding using 16 bits: 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 – the first bit is highlighted to indicate it is status, not data.  The non status bits of the VLQ encoding of 128 are collected together without leading zeros: 1 0 0 0 0 0 0 0; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; 10000000; 10000001 00000000; 10000000; Value; Bit encoding; VLQ encoding

Those leading six 0s are uninteresting to us as well. And that gives us eight bits, a 1 followed by seven 0s.

MIDI Event structure: 1. Delta time – 128 – VLQ decoding – done

The value 127 is shown in standard encoding using 8 bits: 0 1 1 1 1 1 1 1 and in VLQ encoding using 8 bits: 0 1 1 1 1 1 1 1 – the first 0 is highlighted to indicate it is different between the two encodings.  The value 128 is shown in standard encoding using 8 bits: 1 0 0 0 0 0 0 0 and in VLQ encoding using 16 bits: 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 – the first bit is highlighted to indicate it is status, not data.  The non status bits of the VLQ encoding of 128 are collected together without leading zeros and moved under the bit encoding to show it is the same: 1 0 0 0 0 0 0 0; text: 1. Delta time; Variable Length Quantity (VLQ) Encoding; 1 byte = 8 bits = 1 bit status + 7 bits value; 127; 01111111; 01111111; 128; 10000000; 10000001 00000000; 10000000; Value; Bit encoding; VLQ encoding

Where have we seen that before? Oh yeah, the standard bit encoding of 128.

Hurrah! So that’s Variable Length Quantity encoding.

Aside: UTF-8

text: Aside: Yes, this is how UTF-8 works, how astute of you to notice!

As an aside: you may recognise this kind of encoding if you’ve ever dealt with UTF-8. It’s also a variable length quantity style encoding where a character is stored as 1, 2, 3 or 4 bytes. Not this exact encoding, but still, those MIDI spec designers were on to something!

MIDI Event structure: 2. Event type

text: 2. Event type; 1 byte = 8 bits = 1 bit status (1) + 7 bits value; 1xxxxxxx = 7 bits to encode the type value

MIDI event types are stored in a single byte, but we only have 7 bits to store the type value, because the first bit must be a 1.

There are lots of event types which can be broken into 4 categories:

  1. MIDI things I do understand, but don’t care about like lyric or copyright information,
  2. MIDI things I don’t understand like ports and resetting,
  3. music things I don’t understand like tempo or pitch bending,
  4. music things I do understand and care about.

MIDI Event structure: 2. Event type – note on & off events

text: 2. Event type; 1 byte = 8 bits = 1 bit status (1) + 7 bits value; 1xxxxxxx = 7 bits to encode the type value; 1000xxx = note off; 1001xxx = note on; = 1 bit status + 3 bit type + 4 bit channel

There are exactly two of these: “turn this note on” and “turn this note off”

  1. 1000xxxx for “turn this note off”
  2. 1001xxxx for “turn this note on”

The remaining 4 bits of the type byte contain the channel number in which to play this note or stop it. MIDI has 16 channels to play sound on and a value between 0 and 15 neatly fits into a 4-bit number so that’s convenient. A channel is kind of like an instrument; it’s not really as simple as that, but for our purposes it can be.

MIDI Event structure: 3. Event data

text: 3. Event data; 1 byte = 8 bits = 1 bit status (0) + 7 bits value

MIDI event data is also stored in a single byte, and we also only have 7 bits to store the value, because the first bit must be a 0.

Yes! This means we can easily tell the difference between a type byte and a data byte. This is probably useful for reading files and bailing immediately on corrupt data.

Our note events both take 2 bytes of data; one for the key, and one for the velocity.

Key is literally which note to play. For example middle C, a popular note I believe, is note number 60 (not quite the middle between 0 and 127, but whatever music nerds).

Velocity is how hard the note is played. Sometimes known as attack. Think of it like a numerical value for how soft or hard you press the key on a piano, or took your hands off the key on the piano.

Other events take different data that mean different things, but all stored in one or two bytes.

Note on/off MIDI event structure

If we put it all together, what do we need to store one of our note on or off MIDI events?

We need:

  • a delta time,
  • a type (on or off),
  • a key,
  • a velocity.

That looks like this:

  • a byte that starts 0,
  • a byte that starts 1 0 0,
  • and then two bytes that start 0.

Or… because delta time can be two bytes it could be a 1 byte, a 0 byte, a 1 0 0 byte and two 0 bytes.

Or… three.

Or… four.

There is no five because delta time is 1 to 4 bytes.

Hopefully you can see the problem we have to solve. It’s verrrrry unlikely that our source file is going to have its bytes arranged so that the first bits magically adhere to this structure.

Our solution is to use our source data to fill in the orange parts of these diagrams, and statically fill in the blue parts.

Generating MIDI events from source data solution

To do this we have to deal with the bits within the source file, not the bytes. We need 27 bits of data from the source file to make one MIDI event:

  • 8 bits can be used to extract the delta time. Using VLQ encoding of the value this’ll be turned into either 1 or 2 bytes. I could use 28 bits as that’s the maximum allowed in a 4-byte VLQ value, but it might also mean pausing for hours between notes and that won’t be fun to listen to. I could use 7-bits so it always fits into 1 byte, but I didn’t learn about VLQ to not need to use it! So 8 bits seems like an arbitrary, but good, choice.
  • 1 bit to decide between a “turn this note on” and “turn this note off” status event,
  • 4 bits to say which channel the note is on,
  • 7 bits for which key the note is,
  • 7 bits for how hard/fast/soft the note is played/stopped.

This way we can use all the data from our source file and be sure that it’s going to be arranged correctly for making valid MIDI data.

Writing bit-scale MIDI events from source data bytes

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L128-L147

Source for code in slide

Here’s the code to do just that.

Writing bit-scale MIDI events from source data bytes: reading enough bytes

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events, highlighting the code for reading enough bytes from the source data; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L129-L130

Source for code in slide

First we read 27 bytes from the source file and we pad it with zeros if there aren’t 27 bytes available (e.g. at the end of the file).

Why 27 bytes? Murray, you said we need 27 bits.

Well the problem is file reading APIs are byte-scale not bit-scale.

27 bytes is 216 bits and it turns out there’s no smaller common factor of 27 (the number of bits we want per event) and 8 (the smallest number of bits we can read at a time). So we read 27 bytes and that lets us create 8 midi events using 27 bits at a time. I told you earlier I didn’t like messing with factoring.

Writing bit-scale MIDI events from source data bytes: converting bytes to bits

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events, highlighting the code to convert source data bytes into bits prior to manipulation; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L131

Source for code in slide

We turn those bytes into a string of their binary representation using sprintf14. The “%08b” format string says:

turn a number into a 0 padded binary number of at 8 characters long

We use sprintf because although we can get the binary representation with to_s(2), it won’t give us the leading 0s we need to make sure the string is eight chars long. Then we join them all together as one long 216 character string of 1s and 0s. Finally, we turn that into an array.

Writing bit-scale MIDI events from source data bytes: extracting a single event

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events, highlighting the code for extracting the bits needed to create a single event; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L134-L138

Source for code in slide

We can then loop 8 times to pull out chunks of 27 bits as we outlined above:

  1. 8 bits for the delta time,
  2. 1 bit for the on / off flag,
  3. 4 bits for the channel,
  4. 7 bits for the key,
  5. 7 bits for the velocity.

Writing bit-scale MIDI events from source data bytes: converting bits back into bytes

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events, highlighting the code for converting the bits back into the structured bytes our events need; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L140-L143

Source for code in slide

Then, we turn all those bits into the valid bytes we need with VLQ encoding for the time, and adding the static 100s and 0s at the front for the event type and data bytes. The to_i(2) says interpret this string as a binary number and convert it into a “real” number again, (binary numbers are real, Murray, what are you talking about?)15.

We put all these numbers into an array…

Writing bit-scale MIDI events from source data bytes: using Array#pack again?!

Snippet of code needed to read bytes from the source file and turn them into the bits needed for MIDI events, highlighting the final `Array#pack` call to write the bytes; source: https://github.com/h-lame/stegosaurus/blob/fc41db8be711b5649b01834c3254ca07bb73626e/lib/stegosaurus/midriffs.rb#L147

Source for code in slide

…Oh, it’s you again pack. Didn’t I say we were done?

Anyway, then we pack the whole array as 1-byte character data. And we can write that to our file.

So now we have unsophisticated, but valid, MIDI data generated from arbitrary data.

MIDI Demo

text: MIDI Demo

Let’s do another demo!

Demo time: MIDI (pt. 1)

irb(main):012:0> m = Stegosaurus.midriffs => #<Stegosaurus::Midriffs:0x000000010fb1c140 @buffer_size=216, @frames_per_second=25, @ticks_per_frame=120> irb(main):013:0> f = m.make_from './README' => "artefacts/README.mid" irb(main):014:0> Stegosaurus.open_midriff f => ""

You know the routine by now:

We get a generator object from stegosaurus, it’s called midriffs this time. We call make_from with the README file and we pass it to the helper function.

Demo time: MIDI (pt. 2)

A screenshot of the MIDITrail app visualising the MIDI file generated from the README.  It does this as a keyboard floating in space travelling along a path of multicoloured rectangles that represent the notes.  The path of notes is quite short, and does not have many notes on it.

I’m opening this in an app called MIDITrail that will let me play the MIDI file, but will also show it as a keyboard travelling along a road of notes in space. Because, why not?

So, let’s listen to the orchestral score of my README:

The stegosaurus README, as a MIDI file16.

That was pretty fun wasn’t it? Bit short.

Demo time: MIDI (pt. 3)

irb(main):015:0> f = m.make_from RbConfig.ruby => "artefacts/ruby.mid" irb(main):016:0> Stegosaurus.open_midriff f => ""

But I know what you all want. You want Ruby the Orchestral Score!

I hope you are ready for this. It takes a while because… well… it’s not very efficient – all that bit and byte manipulation, turning them into strings and arrays takes a while…

Demo time: MIDI (pt. 4)

A screenshot of the MIDITrail app visualising the MIDI file generated from the ruby 3.2.2 interpreter.  It does this as a keyboard floating in space travelling along a path of multicoloured rectangles that represent the notes.  The path of notes is full of rectangles and is very long; it trails off into the distance.

Here we go, it’s quite a lot longer than the README. Let’s pan around, and see, that’s uh, quite the space road of notes.

I am excited to present to you – Ruby 3.2.2 the orchestral score – I assume we’ll want this as the soundtrack to some parties later. Shall we listen to it?

The ruby 3.2.2 interpreter, as a MIDI file17.

[Rapturous applause]

Oh… it’s like an orchestra being kicked down the stairs! I mean, it’s better than the WAV version, right? It also goes on for like an hour or so.

Why?

text: Why?

Ok.

I get it.

You’re probably thinking why?

Why Ruby?

text: Why Ruby?

Ok, let’s start with the easy answer; why ruby?

We don’t normally do this kind of thing in a language like ruby. Bit and byte manipulation is pretty unwieldy, shouldn’t you use C?

Probably, but ruby is the language I know best, and I was able to get fast feedback by using ruby (although that didn’t stop me breaking it this morning apparently). That feels important to me when playing about with these kinds of toys and silly ideas.

That’s really my point here: pick a tool you know and are comfortable with in order to explore and learn. If, like me, you’re learning all about bit and byte manipulation and file formats, best not to also be learning about a new programming language at the same time. Of course, the flip side is that if you are learning a new programming language, you should re-implement a problem you’ve already solved and learn how to do it in the idioms and libraries of your new language.

I’ll let you into a little secret, that’s exactly what I did with this code. The first version of the WAV file generator was written in 2004 when I was a Python programmer. In 2007 I became a ruby programmer and I decided to port my little thing over to ruby to try and learn some more idioms. Although it’s a good job we glided over the BMP generator because it’s clear I didn’t learn very many ruby idioms when I did it.

Why this?

text: Why This?

But, more existentially, why have I made this actual thing? It is, let’s face it, a pointless little toy, and why have I eaten 44 minutes of your life telling you about it?

I made it because I was curious and I thought it would be fun. We don’t often get to combine those two traits at work and that’s what I want to encourage by sharing my story.

Day-to-day coding at work can be… boring? Same-y? A friend boiled a lot of what we do down to:

putting strings in a database and taking them out again18.

I’m not saying work can’t be exciting, or challenging, or mentally stimulating – it often is! But an important part of a project like this for me is the freedom and fun of it all. I’m not making tradeoffs about user value and tech debt. I’m not following a road map or worrying about best practices. I’m just exploring something that’s interesting to me and making choices of what to build based on the whim of it.

Maybe you’re lucky and the itches you want to scratch are exactly the ones you get to scratch at work every day, but programming computers is amazing – we can make them do anything with just a few lines of letters, numbers, and too much punctuation. I refuse to believe that you have nothing outside the limited scope of your job you’d want to make a computer do! I encourage you to embrace that and go explore some idea that’s fun to you.

I’m categorically not saying you have to “hAvE a PaSsIoN fOr CoDiNg” and fill your evenings and weekends with extra coding. You can do this in your 9-5 – think about how you could embrace curiosity and fun in your day-to-day work. What are the opportunities to follow your whims? Look for those, or make the opportunities yourself. I shared the rubocop-magic_numbers gem, not just to shoe-horn a work reference in to justify them paying for the trip (thank you Cleo!) but because it was built by a colleague who wanted to play with rubocop’s AST stuff and managed to fit that into their day-to-day work in way that would be useful to the rest of us.

So, maybe you’ll learn something useful for work;

  • I learned about space saving encoding techniques, and bit and byte manipulation, which could come in handy if I ever do embedded systems work.

Maybe your side-projects can spin out into a livelihood;

  • I will happily come DJ at your wedding with a custom orchestral score generated from your personal data files

Maybe your side-projects will even be useful at work;

  • like a new rubocop gem to encourage best practices

But, those shouldn’t be the only reason to do something.

This isn’t about work, it’s about play.

Source Code

url: https://github.com/h-lame/stegosaurus

https://github.com/h-lame/stegosaurus

All the code lives here if you want to play with it. It won’t hold up to a critical reading, and every time I come back to it I find some more bugs (and put more in apparently).

I said earlier I chose this name because I thought of it might be a steganography tool to allow you to hide data inside other formats. But it’s not because I wasn’t interested in writing the reconstruction routines that BMP and MIDI would require because… well, this was for fun and that didn’t interest me.

Thanks for listening, bye!

Thanks for listening, bye!; Murray Steele; Cleo; @hlame@ruby.social; RubyConf San Diego ’23

Thanks for listening, bye!19