Surprises with Rust's `as` (and Python division)

16 min read July 08, 2024 updated: July 14, 2024 #rust #python #programming

I've updated this post to clarify some parts and add some more context and examples. Thanks to everyone who gave me feedback on earlier versions!

Coming to Rust mostly from higher-level languages, I've experienced surprising behavior when using as for type casting, and started to avoid it where I could.

In this post, I give quite a bit of background information, which I hope explains §why someone might want to use asin the first place, §why I recommend for newcomers to Rust not to use as, and what I prefer to use to convert between types instead. If you want to skip that background and go straight to the point, press §here.

§Some background

Over the first week of my RC batch, I met a lot of people who want to learn Rust. A fair number of people coming to Rust (at RC and beyond) have backgrounds primarily in higher-level languages like Python or JavaScript. That's how I came to Rust as well — though I had written C in college, and Java before, I promptly moved on to Python because it was just plain more convenient for everything I wanted to do.1

I really enjoy coding in Rust now, but it took a while for me to get there, and I abandoned learning it multiple times. Picking up Rust can be particularly frustrating for higher-language programmers. On top of the things that everyone has to grok, like ownership or aliasing, non-systems-y people like me also immediately get hit with things like Rust having 14 built-in types to represent numbers — meanwhile, Python has three (and most of us probably only ever deal with two — sorry, complex!), and JS has one (or two if you count BigInt).

This kind of lower-level complexity is why I really appreciate people like Nicole creating resources like YARR! It targets specifically people who know how to program in high-level but maybe not lower-level languages, and who want to get started with Rust without immediately getting overwhelmed. And one thing that can cause newcomers to Rust extra overwhelm is the number of resources telling them what not to do, just as they're figuring out what they can do in the first place.2

I know that advice is probably coming from a good place, and has merit — we don't want newcomers to learn unidiomatic constructs and form bad habits that could hurt them in the long term. But often, the "don't do <x>" advice doesn't give clear direction on what to do instead,3 which can make newcomers feel more confused and stuck, or, at worst, make them give up altogether.

So I want to be cautious with this post, and first explain where I'm coming from with my own "don't do <x>" advice about type casting with as.

§Gotchas with types (more background)

Not only does Rust have all those numeric types, Rust is also much stricter about types than other languages people might be used to.

§Being flexible with types is convenient!

Take this Python function:

def get_half(number):
    return number / 2

We can give it a float or an int, and it does what most people would intuitively expect it to do:

>>> get_half(2)
1.0
>>> get_half(1.5)
0.75

No matter whether we give get_half an int or a float, it can work with either, and always gives us back a float. Makes sense. Great.

Now, let's say we want to get half of something that is only ever an integer. For example, we might want to distribute some computationally expensive workload among multiple CPU cores, but we don't want to use more than half of our CPU cores to save the rest for something else. We expect our input to always be a whole number, we need our output to always be a whole number,4 we never want to get more than ½ of the output, so we choose to use floor division, and write this function instead:

def get_half_floor(number):
    return number // 2

We can give it an int, and it does what we think it should.

>>> get_half_floor(6)
3
>>> get_half_floor(7)
3

We get back an int, as we should, and we never get more than half, which is how floor division by 2 should work. Great.

§...except maybe also confusing and risky

Someone quietly changes the code we use to get the number of CPU cores, and it starts giving us a float instead.5 What would get_half_floor do?

>>> get_half_floor(6.0)
3.0

No matter whether we give get_half_floor an int or a float, it can work with either, and gives us an int if we passed it an int, and a float if we passed a float. Makes sense. Great.

Wait.

>>> get_half(2)
1.0
>>> get_half(1.5)
0.75
>>> get_half_floor(6)
3
>>> get_half_floor(6.0)
3.0

get_half, which uses "regular" division, can take an int or a float, and always returns a float. get_half_floor, which uses floor division, can take an int or a float, and always returns the same numeric type that we gave it. Huh.

While both behaviors might make sense individually, this inconsistency between the two has confused some people.6

Worse, because our get_half_floor function happily takes and returns both ints and floats, if we start getting a float for the number of available CPU cores, we might not realize something is wrong until much later — for example, after our program suddenly crashes with a TypeError when we try to use multiprocessing.pool.Pool or concurrent.futures.ProcessPoolExecutor (both of which only accept an int for the number of processes). So, while the flexibility that comes with dynamic typing or type coercion is often convenient, it can also cause subtle bugs or unexpected behavior, and tracking that down in a complex codebase can be difficult and frustrating.

Which is likely why Rust does not implicitly coerce types at all.

§The challenge with Rust's strict type system

Note: If you want a refresher on Rust's types I will be using in this section, please check the Data Types section of The Rust Programming Language (a.k.a. The Book).

Let's say we want to port the above get_half function to Rust.

First, we have to pick which of those many numeric types we want our function to take as an argument. Maybe we choose f64, because that's the equivalent of Python's float,7 and is Rust's default floating-point type, and write the following function:

fn get_half(number: f64) -> f64 {
    number / 2.0
}

and then use it with another floating-point input:

fn main() {
    let number = 2.0;
    let result = get_half(number);
    dbg!(result);
}

We compile and run that code and get:

[src/main.rs:8:5] result = 1.0

Great.

But what if we change number to be an integer?

fn main() {
    let number = 2;
    let result = get_half(number);
    dbg!(result);
}

Rust doesn't allow that.

error[E0308]: mismatched types
 --> src/main.rs:7:27
  |
7 |     let result = get_half(number);
  |                  -------- ^^^^^^ expected `f64`, found integer
  |                  |
  |                  arguments to this function are incorrect
  |
note: function defined here
 --> src/main.rs:1:4
  |
1 | fn get_half(number: f64) -> f64 {
  |    ^^^^^^^^ -----------

For more information about this error, try `rustc --explain E0308`.

Rust doesn't even allow us to use f32 as-is, either:

fn main() {
    // We have to specify the type here explicitly
    // because otherwise Rust's compiler will infer
    // f64, as that's the type that matches the signature of `get_half`.
    // When left unspecified, the compiler analyzes how we use the value
    // and (usually) infers the most helpful type.
    let number: f32 = 2.0;
    let result = get_half(number);
    dbg!(result);
}
 --> src/main.rs:7:27
  |
7 |     let result = get_half(number);
  |                  -------- ^^^^^^ expected `f64`, found `f32`
  |                  |
  |                  arguments to this function are incorrect
  |
note: function defined here
 --> src/main.rs:1:4
  |
1 | fn get_half(number: f64) -> f64 {
  |    ^^^^^^^^ -----------
help: you can convert an `f32` to an `f64`
  |
7 |     let result = get_half(number.into());
  |                                 +++++++

For more information about this error, try `rustc --explain E0308`.

(When we try to call get_half with an f32, Rust does at least suggest we can use into, which is helpful and makes this code work.)

Our Rust function takes an f64, so we can only give it an f64, and nothing else will compile. This could save us from the kind of subtle bugs and inconsistencies we saw in our Python code.

But what do we do if our input has one of the other numeric types?

§Using as to cast types

§It's convenient!

As YARR! points out, we can use as to cast our number to f64, whatever its type.

fn main() {
    let number = 2;
    let result = get_half(number as f64);
    dbg!(result);
}

It compiles, and runs, and we get what we want:

[src/main.rs:8:5] result = 1.0

All is well. We just learned a convenient way to work around Rust's strictness with types by using as.

§...except maybe also confusing and risky

YARR! also says this:

You do have to be careful to ensure that if you cast to a smaller size value, that you won't overflow anything. The behavior you get is well-defined but may be unexpected.

What's this unexpected behavior?

Well, remember the get_half_floor function for getting no more than half of available CPU cores?

We port that to Rust, too:

fn get_half_floor(number: u8) -> u8 {
    number / 2
}

fn main() {
    let number = 6;
    let result = get_half_floor(number);
    dbg!(result);
    
    let another_result = get_half_floor(7);
    dbg!(another_result);
}

We use u8 for get_half_floor's input type, which maxes out at 255, and 255 CPU cores ought to be enough for anybody. We compile and run our code.

[src/main.rs:8:5] result = 3
[src/main.rs:11:5] another_result = 3

Great.

But now someone gives us a machine with a Sierra Forest CPU, which has 288 cores. Our get_half_floor function still takes a u8 though, so I guess we'll be limited to half of 255 until we can update our code. That should do for now, though.

We run our code:

fn main() {
    let number = 288;
    let result = get_half_floor(number as u8);
    dbg!(result);
}
[src/main.rs:8:5] result = 16

Wait. We didn't get half of 288, which we knew we wouldn't, but we didn't even get half of 255. That's odd. Why?

fn main() {
    let number = 288;
    dbg!(number as u8);
}
[src/main.rs:7:5] number as u8 = 32

That's what that whole "be careful not to overflow" thing with "unexpected behavior" was about. Rust By Example gives some helpful, uh, examples, and explains that when we use as to cast a number to a u8, and our number can't fit into a u8

the first 8 least significant bits (LSB) are kept, while the rest towards the most significant bit (MSB) get truncated.

This makes sense if we look at the binary representation:

fn main() {
    let number = 288;
    println!("288 in binary is {number:b}");
    println!("288 as u8 in binary is {:b}", number as u8);
}
288 in binary is 100100000
288 as u8 in binary is 100000

288 doesn't fit into u8, so when we cast it to u8 with as, Rust indeed took only the 8 least significant bits (going from right to left), and discarded the rest. This gave us 100000 in binary, which is 32 in decimal,7 and explains why get_half_floor gave us 16 when we called it with 288 as u8.

So, similarly to our earlier Python functions surprising us by giving us floats when we might've expected an int, using as could lead to subtle bugs and confusing behavior that we might not notice or find hard to track down in a larger codebase. Systems-y people know about this, but programmers coming from higher-level languages might find it surprising.

§So what's the alternative to as?

I have gotten in the habit of using TryFrom when I need to convert between types and avoid surprising (to me) results. As its docs say:

the TryFrom trait informs the programmer when a type conversion could go bad and lets them decide how to handle it.

With that, we could now change our code to

fn get_half_floor(number: u8) -> u8 {
    number / 2
}

fn main() {
    let number: u16 = 32;
    let result = match u8::try_from(number) {
        Ok(number) => get_half_floor(number),
        Err(error) => panic!("Couldn't convert {number} to u8: {error}"),
    };
    dbg!(result);
}

and get

[src/main.rs:11:5] result = 16

And if we try a value that doesn't fit, like here:

// --snip--

fn main() {
    let number = 288;
    // --snip--
    dbg!(result);
}

we get

thread 'main' panicked at src/main.rs:9:23:
Couldn't convert 288 to u8: out of range integral type conversion attempted

which tells us something went wrong (and what), instead of surprising us.

In real code, we might want to handle the error some other way instead of just panicking, like

Or maybe something else! Our strategy for handling the error would depend on what we're ultimately trying to do, and what would be appropriate for our library/application and the user. But we would know something unexpected happened, and we'd explicitly decide how to deal with it.

Using TryFrom is definitely more verbose, and requires more boilerplate, which makes it less convenient than as. But like others have pointed out, it's one of a few relatively uncommon cases where going against Rust's "default" way of doing something may actually be a better pattern.

Thanks to David Glivar and Nicole Tietz-Sokolskaya for giving me feedback about earlier drafts of this post, Charles Eckman for digging up useful context about cgroups and a proposal to include support for detecting fractional CPUs in Python's standard library, and swan for asking a question that led to an addendum to this post!


§Footnotes

1

I now know that convenience came with a significant cost — aside from just performance — but that wasn't clear to me then. It's also potentially a topic for a whole separate post.

2

I don't want to specifically call anyone out on doing this, but I imagine most people who have spent long enough learning Rust have seen things like "don't use clone", "don't use Rc", "don't use dyn Trait", and more of the like.

3

This is hard, because overusing unidiomatic constructs can indicate deeper problems with design, and solving design problems is hard and hugely depends on the specific context.

4

Our program may be running in an environment with fractional CPU cores (like a container or a cloud VM). However, Python has no built-in way to detect that, so if you are using os.cpu_count or len(os.sched_getaffinity(0)), you will always get an int.

5

Maybe someone updates the CPU-core-getting code to parse cgroup and find the fractional number of CPU cores available to our program, but doesn't call int before returning the result.

6

This was not supposed to be a deep dive into the gotchas of Python's division operators, and I am not trying to pick on Python here, because language design is hard and full of never-ending trade-offs. But also, this seems like a bit of a footgun. Even the spec itself is not super easy to follow:

Division of integers yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the ‘floor’ function applied to the result.

This says nothing explicitly about floor division of floats. On top of that, that bit at the end about the 'floor' function applied to the result can be misleading with floats specifically, because math.floor always returns an int. This inconsistency has confused people, too.

7

Python's float is "usually" a C double, which on "most systems" is a 64-bit floating-point type (although C does not guarantee that), which is an f64 in Rust (and Rust does guarantee that).

8

See here if you'd like a refresher on representing integers in binary.

If you have feedback, I would love to hear from you! Please use one of the links below to get in touch.