Notes on Learning Rust

Published on November 14, 2022

I’ve recently been learning Rust and I just finished finished The Rust Programming Language. While I went through this I supplemented the many questions I had with other resources like The Rustonomicon and various Google searches. It’s mostly these detours I want to write about.

This post is going to be more questions than answers since for a lot of these things I genuinely don’t know, or my research just led me to more questions.

Rust is not C

An example type I’ll be using a bit is ZOT1, which I’m using as a marginally more complicated Optional.

enum ZOT<T> { Zero, One(T), Two(T, T) }

The name stands for Zero One Two.

Let’s say I have a function that removes the rightmost element unless it’s Zero.

fn remove_last<T>(zot: &mut ZOT<T>) {
    match zot {
        ZOT::Zero =>      (),
        ZOT::One(_) =>    { *zot = ZOT::Zero; },
        ZOT::Two(x, _) => { *zot = ZOT::One(x); }
    }
}

Rust doesn’t love this code because I use the ZOT::One constructor with a borrowed object, and I really have no right to do so. That’s fair enough, but from my perspective it still looks correct. If I’m overwriting whatever’s stored at the zot, surely zot no longer owns x?

If I replace the last case with panic!, everything works.

fn main() {
    let mut zot = ZOT::One("x");
    println!("{:?}", zot); // One("x")
    remove_last(&mut zot);
    println!("{:?}", zot); // Zero
}

If I compile this, the *zot = ZOT::Zero line looks like,

lea     rsi, [rsp + 16]
mov     edx, 40
call    memcpy@PLT
jmp     .LBB74_2

which makes sense. With optimizations off it seems like we’re constructing ZOT::Zero on the stack and then we need to copy it over.

There’s nothing to indicate "x" is being dropped, but dropping it wouldn’t do anything anyway since it’s a string literal. What if we change that to "x".to_string()?.

mov     qword ptr [rsp + 16], 0
mov     rax, qword ptr [rip + core::ptr::drop_in_place<example::ZOT<alloc::string::String>>@GOTPCREL]
call    rax
jmp     .LBB128_7
.LBB128_7:
mov     rcx, qword ptr [rsp + 64]
mov     qword ptr [rax + 48], rcx
movups  xmm0, xmmword ptr [rsp + 16]
movups  xmm1, xmmword ptr [rsp + 32]
movups  xmm2, xmmword ptr [rsp + 48]
movups  xmmword ptr [rax + 32], xmm2
movups  xmmword ptr [rax + 16], xmm1
movups  xmmword ptr [rax], xmm0

So, when we overwrite the value of an immutable reference, we drop what’s there? That makes sense. I don’t know why memcpy was replaced with movups though.

The difficulty here is indicating to me this is the wrong pattern, but it turns out there’s a function that does what I want.

ZOT::Two(_, _) => {
    if let ZOT::Two(x, y) = std::mem::replace(zot, ZOT::Zero) {
        *zot = ZOT::One(x);
    }
},

This works, but it’s incredibly awkward. It’s easy to imagine a type where every enumeration contained generic parameters, and replace would no longer be helpful.

I don’t know if there’s a better way to do this than replace, but the sense I get from the difficulty in doing this this way, is that this is a bad pattern in Rust, and the Rust language designers would much rather have less mutability.

Rust is not Haskell

One of the reasons I think Haskell gets a reputation for being tough is it makes it very difficult to do things the non-haskell way. If you want to write procedural code that works like a while-statement you can use a state monad. But if you’re at the point where you know what a state monad is, you know enough to be wary about using procedural structures.

That sounds like a similar lesson to what I just learned with Rust, but Rust has traits and traits are very similar to typeclasses, so now I’m going to try and treat Rust traits like typeclasses.

Here’s our good friend ZOT.

data ZOT a = Zero | One a | Two a a

Since it’s a list-like structure we can define Foldable for it.

instance Foldable ZOT where
    foldr f z Zero      = z
    foldr f z (One x)   = f x z
    foldr f z (Two x y) = f x (f y z)

Foldable isn’t the same as Iterable, and there seem to be attempts to bring Foldable into rust because googling around lead to some Foldable traits.

Instead, I’m going to focus on functors, since it’s a bit simpler and is far more fundamental for a language to have if it wants to be Haskell.

instance Functor ZOT where
    fmap f Zero      = Zero
    fmap f (One x)   = One (f x)
    fmap f (Two x y) = Two (f x) (f y)

We can take inspiration from Option, which has this map function defined.

#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_unstable(feature = "const_option_ext", issue = "91930")]
pub const fn map<U, F>(self, f: F) -> Option<U>
where
    F: ~const FnOnce(T) -> U,
    F: ~const Destruct,
{
    match self {
        Some(x) => Some(f(x)),
        None => None,
    }
}

A few things stand out to me. First,this function is unstable and I don’t quite understand why even after looking at the related issue. It’s related to ~const which I don’t understand and links to this write-up about ~const which I understand even less.

Anyway, here’s my attempt at repurposing to ZOT.

pub const fn map<U, F>(self, f: F) -> ZOT<U>
    where
        F: ~const Fn(T) -> U,
    {
        match self {
            ZOT::Zero     => ZOT::Zero,
            ZOT::One(x)   => ZOT::One(f(x)),
            ZOT::Two(x,y) => ZOT::Two(f(x), f(y)),
        }
    }

Rust doesn’t like this because ~const is experimental and it doesn’t like this code without ~const because something should be ~const.

This doesn’t spell good news for creating a trait and the straightforward implementation doesn’t work.

trait Functor {
    fn map<A,B>(&self, f: |&A| -> B) -> Self<B>;
}

It says, “type arguments are not allowed on self type”. Why? I’m not sure. There’s actually a lot of discussion online about why there’s not a Functor trait, along with attempts to create one. But my conclusion here is that it’s not really possible to create one, without some sacrifices. There’s even more discussion with respect to monads.

Just like I learned Rust isn’t C, Rust also isn’t Haskell. Specifically, because in this case because Rust doesn’t have type kinds. I’m still understanding why that’s the case, and I’m sure if it’s related to the ability to output code. Haskell has spoiled (ruined?) my thinking about types in that I’m comfortable thinking in terms of kinds, but now a know I’ll need to learn a different way of doing this.

Closures and Dynamic Dispatch

The last thing that I kept wondering throughout the book was how closures and dynamic dispatch works, and answer I found is through implicit structs and function pointers.

An example of something I used a lot while reading the book was unwrap_or_else. Initially I suspected functions that took a function were always inlined or had to follow a specific set of rules. I also noticed that bewildering ~const.

#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_unstable(feature = "const_option_ext", issue = "91930")]
pub const fn unwrap_or_else<F>(self, f: F) -> T
where
    F: ~const FnOnce() -> T,
    F: ~const Destruct,
{
    match self {
        Some(x) => x,
        None => f(),
    }
}

There’s a StackOverflow answer that does a good job explaining things. You put all the enclosed variables in a struct, and give that struct and the function to the calling function. The calling function thens to pass the struct along with the function. This feels similar to me in how C functions that take function pointers, often the function pointer will take an extra void * parameter that’s also passed to the function.

I can create a small test by looking at the assembly of a function that returns a closure.

fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
    let y = 100;
    Box::new(move |x| x + y)
}

I’ll note here Rust generates some really verbose assembly if you leave optimizations turned off. For this example I intentionally set opt-level=3, which saved roughly half the instructions.

example::returns_closure:
        push    rax
        mov     edi, 4
        mov     esi, 4
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB6_1
        mov     dword ptr [rax], 100
        lea     rdx, [rip + .L__unnamed_1]
        pop     rcx
        ret
.LBB6_1:
        mov     edi, 4
        mov     esi, 4
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
        ud2

.L__unnamed_1:
        .quad   core::ptr::drop_in_place<example::returns_closure::{{closure}}>
        .asciz  "\004\000\000\000\000\000\000\000\004\000\000\000\000\000\000"
        .quad   core::ops::function::FnOnce::call_once{{vtable.shim}}
        .quad   example::returns_closure::{{closure}}
        .quad   example::returns_closure::{{closure}}

There’s a bit of a difference in that looks like it needs more than just the function pointer (for instance, how to drop), but close enough.

But, unlike C, I can do things like |x| move |y| x + y, which is a curried function. Anything similar in C wouldn’t be nearly as terse.

My questions about closure were similar to my questions about dynamic dispatch, and the answer is vtables, just like in the assembly above. It seems unless you explicitly have dyn somewhere in your code you can assume vtables won’t be used, and expect monomorphization instead, requiring that compiler can compile the exact function that will be used.

That last part begs another question: how does this work with shared libraries? For instance if I’m packaging ripgrep, and ripgrep is one of many packages that use the regex crate, I’ll want to package regex as a shared library and make that a ripgrep dependency. And if regex (for sake of discussion) provides a lot of generic trait implementations or functions, does that mean my packaged regex needs to include every possible monomorphized function from all combinations of types and functions it provides?

There doesn’t seem to be a practicle answer since neither debian nor arch appear to attempt to split up ripgrep into its dependencies. Considering arch goes to great lengths to split Haskell packages up (xmonad is dependent on haskell-x11, haskell-setlocale and haskell-data-default-class) I’d assume they’d be doing the same for Rust if possible 2.

Closing Comments

When I started writing this I had a bunch of things I wanted to note, but as I wrote more questions came up and I kept thinking, how do I demonstrate that I actually understand this? and spent longer on some things and skipped a lot. This article is purely a byproduct of my learning, hence the unanswered questions and errors that I’m sure I’ve made. I’m a long way from learning Rust and a project in mind to actually do something with it.

Although Rust isn’t C or Haskell I can see the mix of influences and am interested by the recent addition of GATs into the language, while far away from encountering the situations where I’ll need them.


  1. This isn’t original, see https://crates.io/crates/zot ↩︎

  2. Arch’s packages guidelines state, “While some Rust projects have external dependencies, most just use Rust ecosystem libraries that are statically embedded in the final binary”. ↩︎