Drums, Rust, WASM and a Bevy

My Rusty journey has been off to a slow start this year. Predictable, somewhat, but the combination of my 8-month old’s sleep regression alongside my wife’s business at work (she’s in compensation, and it’s end-of-year review time) has hit my capacity to hack quite intensely. Couple this with an on-call rotation for work and… I know what you’re thinking, “Great, he’s starting a blog off with excuses”. I am! Thankfully, there’s plenty more to write about than 2021.

In actual fact, I managed to start the year off with fairly good forays into a couple of Rust pursuits, this post will go into both of them a little, but also cover my prelude; how I found myself learning Rust.

Note, on the Side

I’ve spent a good chunk of this week at work writing unit/integration tests (we use Go at work). I uncovered some bugs in some porting work and really enjoyed leaning on Go interfaces to be able to employ dependency mocking. It is really, really simple to swap out explicit types for an interface in your Go structs. In Rust I suppose you’d change a regular struct member into a Trait Object? My mind’s a bit cloudy on what such a movement might look like, but I don’t think it would be difficult, nor limiting? In any case, inspired by how easy it was to use the Rust debugger, I also broke my IDE groove a little not only running tests directly from my editor (gasp) but also using the debugger to help break through some hairy patches (double gasp). I’ll thank learning Rust for reminding me of the simple things in life.

Web Assembly

After putting down The Book at the end of 2020, my first step was straight into Another Book. Mercifully this was a smaller one. It reaches you how to use Rust with WebAssembly, basically enabling you to run Rust code directly from your browser.

While this is pure awesomeness, this was was the also the end of a personal journey for me. The lesson shattered an infantile quest I had laid out in my most feeble of minds. Let me begin…

Great Expectations and A Drummer Buddy

Rust and I started to forge our journey together not born from spontaneous desires for memory safety nor convenient, low level APIs. Instead the journey was born from about three or four layers of misdirection. You see, it all started with an idea I had. An idea for a Machine Learning concept I wanted to toy with. The concept was simple. To put a drummer into a guitar pedal.

I wanted to be able to start playing my guitar, or bass, or whatever, and then after a couple measures or so -bitibitibitibitboomboom!!! Our drummer friend enters with a hot fill and lays into a perfectly synchronized genre-fitting, mood paired, series of drum patterns - just like a really cool and competent drummer would in a real life, right? Even better than real life, this drummer doesn’t drink all your beer and smell of sweat and cigarettes!

The idea is really simple, so I looked around to see if I could just buy it already off a shelf… Low and behold there’s pedals and virtual-plugins that claim to do it! Cautious, and not wanting to throw away more money on pedals I never use, I dived deep into the reviews. As it turns out, the reviews are pretty bad, particularly for the scenario I had envisioned above.

Could I Do It?

Gently stroking some stubble on my chin, I thought about my Master’s degree, specializing in Deep Learning™ - surely my training must count for something right?? Well yes, but mainly no. In my masters program I had already done some model training on audio samples. I knew that you can get pretty good recognition results from fairly “dumb” network designs. I also figured some of the higher level features, such as genre detection, could come later too. It’s amazing how far a simple drum beat will get you, so long as the. drums. are. on. time…

With this I started looking into tempo detection research…

Hacking My Way to the Summit

Not long after the hunt began did I drift across Tempo-CNN. It’s an open source implementation of Tempo Estimation deep learning networks that the author has published articles about over the recent years. I was exhilarated and started to read the research.

Having gained a rudimentary understanding of how the input was massaged and the network architected, I figured a fun exercise would be to port the code from TensorFlow to PyTorch. I never liked TensorFlow and had been meaning to learn PyTorch, so why not? Much fun was had doing this and I may or may not have bought one or more new GPUs to aid said fun

Anyway that leg of the story ended with my PyTorch replica reaching about 50% accuracy when detecting naked guitar samples I provided. I had created a large testing dataset. It was augmented, or synthetic - typically about a hundred thousand 30 second audio samples - seeded from around a dozen guitar recordings I made. I could go into this a lot more, but I’ll save that for when I have some cool demos to share alongside.

Must, Beat, Better!

So there I was, around 9 months ago. Sitting alone with a drummer who could detect my tempo 50% of the time, but also who couldn’t drum yet too (a minor implementation detail). I looked into the cases my drummer was getting wrong and it turned out a lot of them were simply labelled at 1/2 time or double time. This is a really good thing and actually strikes on a theme I see in music notation very frequently. What some person may label as a 3/4 beat, another may see at 6/8. From a programming perspective this can be “fixed” quite easily too, using heuristics to coerce a sane tempo. You could even dedicate foot switches to halving or double the current tempo.

Without a ton of tweaking I had some semi-passable results! Sadly, they were still nowhere near what I wanted. With yet still a mountain of other work ahead of me for a product, I stubbornly would not move forward until I had the most epic of tempo detectors I could get.

Now how does one improve a neural network model?

Data

My training datasets were fairly large, on the order of 1e5 - 1e6 spectrograms, but simply being large is not enough. The main issue was that they were generated from such a small seed sample set. I used SoX to apply many fancy effects such as distortion, reverb and time dilation, but being super lazy, I simply refused to record any more seed tracks (hey, it’s a tedious process!).

So there the tempo detector sat, for probably 6 months. I don’t even remember what I did during that time. I guess I played a lot of Monster Hunter with my friends and drank beers for a while…

What followed was absolute consummation by the frantic storm that came with the birth of my second son, in a world oppressed by COVID, in a country being run by a uranium faced (gue?)gorilla, more than ten thousand kilometers from mine and my wife’s family. Drummer buddy was in deep hibernation.

Phoenix, or Turkey?

When summer finally ended and winter arrived, I was getting more and more frustrated with the California Shelter in Place order. Not because I’m an idiot and want to spread COVID, but simply because I just wanted to jam out and make some music with people!

Somewhere in the dome of my mind, Drummer Buddy called out from the depths. “Of course!”, I yelped as excited memories returns. “Why did I put this down??”… Oh yeah, I needed to record more seeds… So I started to think about ways I can get more recording samples without actually needing to do all of the work…

To be fair, I had in mind a seed dataset of probably at least a thousand, or two samples. I looked into Mechanical Turk and realised I could probably crowd-source them for not too much cost. The issue is that I would need to supply an interface for “workers” to record into as this would a “complex” task which MTurk couldn’t supply the web interface for.

Such an interface would be akin to a DAW, but much simpler. Really it just needed to play a drum track and expose simple recording functionality. The MTurk workers would need their own recording equipment, ideally an audio interface with a preamp for their guitar. This seemed like a plan.

But how would I make something like that, I asked.

Enter, Critical Mistake

“Oh I definitely wouldn’t want to do all of that in Javascript. It’s an event loop based language and only has Float64s - it will be terrible for anything audio related… You know what I need for this? WebAssembly! And what would I write the audio processing parts in… Ruuuuuuuuuuuusssssstt”


With precisely that chain of thoughts, I had given myself complete license to turn 180 degrees away from Drummer Buddy and focus on learning Rust. Why in the name of anything I decided to also write a blog series about it, is beyond me. Alas, here I am, happily pecking away at my keyboard with a ridiculous grin smeared across my face.

Prelude Fin.

Forward to this month. There I was, sitting in front of the WASM Rust tutorial, writing the game of life. Having fun, even.

For some reason, I never considered that you would need to bind Rust code to Javascript with WASM. Nor did I anticipate that such a binding would be quite so manual (example). You really have to annotate all the functions you want to use from the JS side manually. I also felt a bit dumb only writing Javscript specifically from the bindgen code. I’m sure you could make it generate Typescript, but I didn’t dig into it.

Overall, my main takeaway from learning RustWASM is that you can use it to run fast, low level Rust code triggered directly from the browser. Now how does that help me build a simple audio recording frontend for the browser, I asked myself…

I next looked for audio examples of RustWASM and found a few examples of WebAudio use, including this pretty cool talk/demo/project (youtube link). The only thing is, all of these examples were basically making simple sound synthesizers or sequencers. Their focus was on doing the DSP work in Rust… This doesn’t mesh well with my needs.

It occurs to me that there could be some good Rust libraries for generating spectrograms of the sound samples. We could do this to reduce the upload sizes for the workers. Though I don’t think 30 second sound clips are something worth optimizing upload sizes for just yet.

(The TempoNet CNNs work on spectrograms directly, as do must audio ML applications)

WebAssembly - too much power

At this stage it just didn’t seem like WASM is really that great for my project. Feeling a little awkward about it, I pivoted into something else completely. Being a fair bit of a gamer nerd, I was checking out a Rust 3D game engine (rg3d) and hanging out in its Discord server.

One day somebody came in asking a bunch of questions of the engine. It was another kid who was determined to make the next Street Kombat Virtual Bushido Fighter. I entertained his questions long enough for his energy to wear thin, but along the way he mentioned something that caught my attention.

He claimed he would just use “Bevy” if it weren’t ECS. If you’re not familiar with this game design pattern, ECS claims to offer memory alignment properties while also letting you code in a fairly lazy way (using dependency injection). I have another story arc with leads to learning Rust, based on ECS too - but I’ll spare you the pain of that today. Suffice to say, I went ahead and checked it out this newfangled “ECS” engine for myself.

Bevy

Bevy is a quite new, fairly lightweight ECS game engine, written in Rust. Having previously started writing an ECS network backend in Go, this looked like a project made in heaven for me - I could write the backend and client in Rust, with a whole entire juicy framework to make my life awesome. I dropped all thoughts of my Drummer Buddy like it was 1963 and leaped into Bevy.

…With WebAssembly!

Bevy lacks extensive documentation currently, but there are quite a few example projects alongside a very active discord. One feature that excited me was WASM support. Seeing as I had gone through the trouble of learning it for Rust, I decided to kick Bevy’s tires with this first. I took the example “Breakout” build and tweaked it…

Click to hide/show game

You can even try playing the game on your browser here

Cool, that wasn’t hard to tweak and it was a pretty fun experience just making that work. I’ll admit there were some major frustrations around WebAssembly though, such as having to use particular rand libraries, for compatibility, and also the Bevy examples not using wasm_bindgen annotations. Instead they opt to use the cli tool directly (which is deprecated and caused all sorts of headaches for me trying to figure stuff out - especially as a novice).

Beginning to Fly

After getting a feel for Bevy, I decided to mess about with “native” Bevy. I was well sick of js hooks and browsers by now. I loaded up a simple static 3D scene example, and added some dynamics to it, see below.

Click to hide/show game

The source for my little thing is on Github. Overall it’s pretty simple stuff, though I am utterly convinced I have the math backward somewhere in my code. The axis don’t make sense to me at all.

How does it work? Seriously - help me

In ECS, the Systems you write are basically functions. The functions act upon Entities. An Entity is simply a unique ID which has Components. Components are your classic data-container structs. Bevy makes adding Systems to your game really easy, by allowing you to turn any function into a System with a .system() method, which Bevy lets you call on literally any function… ?!?

Here’s an example directly from the bevy book

fn hello_world() {
    println!("hello world!");
}
fn main() {
    App::build()
        .add_system(hello_world.system())
        .run();
}

This is some crazy black magic shit to me. How - what? When?

Digging into it, I think it has something to do with this Trait which is defined in Bevy:

pub trait IntoSystem<Params, SystemType: System> {
    fn system(self) -> SystemType;
}

But I got lost in the macro which seems highly related to it, impl_into_system. Lost, fast… Then like a gigantic tuna breaking my neck, my head almost spun itself right off when I read this code after the macro.

impl_into_system!();
impl_into_system!(A);
impl_into_system!(A, B);
impl_into_system!(A, B, C);
impl_into_system!(A, B, C, D);
impl_into_system!(A, B, C, D, E);
impl_into_system!(A, B, C, D, E, F);
impl_into_system!(A, B, C, D, E, F, G);
impl_into_system!(A, B, C, D, E, F, G, H);
impl_into_system!(A, B, C, D, E, F, G, H, I);
impl_into_system!(A, B, C, D, E, F, G, H, I, J);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K, L);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K, L, M);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K, L, M, N);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O);
impl_into_system!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P);

I can’t remember if Rust has a limit on the args you can pass to a func, or if 16 was just the limit that Bevy devs decided to use. In any case it was about here where I decided to not look inside the Bevy source anymore and to remain a meagre, mortal, peasant user. (Joking aside, I did go back into the bevy prelude a few times to figure a few things out, it isn’t all so bad…)

Is there a simple explanation for how .system() appears on all my user defined functions? praying intensifies

It was at about this stage that my journey with Bevy took a nose dive. Life ramped back up for 2021. I think Bevy is a really neat framework and I’d encourage anyone interested in ECS or gamedev to play with it if they haven’t already. I sure hope to do something meaningful with it eventually!

What’s Next in `21

Honestly, I don’t even know right now. Hopefully my parenting life can find a new cruising altitude and my nights will return some lengths of regular peace. Pecking away at a laptop in the dark next to an infant has only has so much “charm” in it. Still, I’m grateful to even have this option.

As for Rust, I don’t particularly know what I want to do. Maybe I could revisit my ideas of emulators/CLI tools. Maybe something DSP related? Maybe add a network backend to a game in Bevy. Or maybe I should step back to my Drummer Buddy idea and flesh out that web UI it needs (good time to learn Svelte?). Maybe I’ll spend my spare moments writing some new music. I’m not too sure right now. The only thing I know is that I’ll write about it in February.

Until then, cya!

comments powered by Disqus