Surprises with Rust's `as` (and Python division)
16 min read July 08, 2024 updated: July 14, 2024 #rust #python #programmingI've updated this post to clarify some parts and add some more context and examples. Thanks to everyone who gave me feedback on earlier versions!
Coming to Rust mostly from higher-level languages,
I've experienced surprising behavior when using as
for type casting, and started to avoid it
where I could.
In this post, I give quite a bit of background information, which I hope explains
§why someone might want to use as
in the first place,
§why I recommend for newcomers to Rust not to use as
,
and what I prefer to use to convert between types instead. If you want to skip that background and go
straight to the point, press §here.
§Some background
Over the first week of my RC batch, I met a lot of people who want to learn Rust. A fair number of people coming to Rust (at RC and beyond) have backgrounds primarily in higher-level languages like Python or JavaScript. That's how I came to Rust as well — though I had written C in college, and Java before, I promptly moved on to Python because it was just plain more convenient for everything I wanted to do.1
I really enjoy coding in Rust now,
but it took a while for me to get there, and I abandoned learning it multiple times.
Picking up Rust can be particularly frustrating for higher-language programmers.
On top of the things that everyone has to grok, like
ownership or
aliasing,
non-systems-y people like me also immediately get hit with things like Rust having
14 built-in types to represent numbers —
meanwhile, Python has three
(and most of us probably only ever deal with two — sorry, complex
!),
and JS has one
(or two if you count BigInt
).
This kind of lower-level complexity is why I really appreciate people like Nicole creating resources like YARR! It targets specifically people who know how to program in high-level but maybe not lower-level languages, and who want to get started with Rust without immediately getting overwhelmed. And one thing that can cause newcomers to Rust extra overwhelm is the number of resources telling them what not to do, just as they're figuring out what they can do in the first place.2
I know that advice is probably coming from a good place, and has merit — we don't want newcomers to learn unidiomatic constructs and form bad habits that could hurt them in the long term. But often, the "don't do <x>" advice doesn't give clear direction on what to do instead,3 which can make newcomers feel more confused and stuck, or, at worst, make them give up altogether.
So I want to be cautious with this post, and first explain where I'm coming from with my own
"don't do <x>" advice about type casting with as
.
§Gotchas with types (more background)
Not only does Rust have all those numeric types, Rust is also much stricter about types than other languages people might be used to.
§Being flexible with types is convenient!
Take this Python function:
return / 2
We can give it a float
or an int
, and it does what most people would intuitively
expect it to do:
>>>
1.0
>>>
0.75
No matter whether we give get_half
an int
or a float
, it can work with either,
and always gives us back a float
. Makes sense. Great.
Now, let's say we want to get half of something that is only ever an integer. For example, we might want to distribute some computationally expensive workload among multiple CPU cores, but we don't want to use more than half of our CPU cores to save the rest for something else. We expect our input to always be a whole number, we need our output to always be a whole number,4 we never want to get more than ½ of the output, so we choose to use floor division, and write this function instead:
return // 2
We can give it an int
, and it does what we think it should.
>>>
3
>>>
3
We get back an int
, as we should, and we never get more than half, which is how floor division by 2 should
work. Great.
§...except maybe also confusing and risky
Someone quietly changes the code we use to get the number of CPU cores,
and it starts giving us a float
instead.5 What would get_half_floor
do?
>>>
3.0
No matter whether we give get_half_floor
an int
or a float
, it can work with either,
and gives us an int
if we passed it an int
, and a float
if we passed a float
. Makes sense. Great.
Wait.
>>>
1.0
>>>
0.75
>>>
3
>>>
3.0
get_half
, which uses "regular" division, can take an int
or a float
, and always returns a float
.
get_half_floor
, which uses floor division, can take an int
or a float
, and always returns the same
numeric type that we gave it. Huh.
While both behaviors might make sense individually, this inconsistency between the two has confused some people.6
Worse,
because our get_half_floor
function happily takes and returns both ints and floats,
if we start getting a float
for the number of available CPU cores,
we might not realize something is wrong until much later — for example, after
our program suddenly crashes with a TypeError
when we try to use
multiprocessing.pool.Pool
or concurrent.futures.ProcessPoolExecutor
(both of which only accept an int
for the number of processes).
So, while the flexibility that comes with dynamic typing or
type coercion is often convenient,
it can also cause subtle bugs or unexpected behavior,
and tracking that down in a complex codebase can be difficult and frustrating.
Which is likely why Rust does not implicitly coerce types at all.
§The challenge with Rust's strict type system
Note: If you want a refresher on Rust's types I will be using in this section, please check the Data Types section of The Rust Programming Language (a.k.a. The Book).
Let's say we want to port the above get_half
function to Rust.
First, we have to pick which of those many numeric types we want our function to take as an argument.
Maybe we choose f64
, because that's the equivalent of Python's float
,7 and is Rust's default
floating-point type, and write the following function:
and then use it with another floating-point input:
We compile and run that code and get:
[src/main.rs:8:5] result = 1.0
Great.
But what if we change number
to be an integer?
Rust doesn't allow that.
error[E0308]: mismatched types
--> src/main.rs:7:27
|
7 | let result = get_half(number);
| -------- ^^^^^^ expected `f64`, found integer
| |
| arguments to this function are incorrect
|
note: function defined here
--> src/main.rs:1:4
|
1 | fn get_half(number: f64) -> f64 {
| ^^^^^^^^ -----------
For more information about this error, try `rustc --explain E0308`.
Rust doesn't even allow us to use f32
as-is, either:
--> src/main.rs:7:27
|
7 | let result = get_half(number);
| -------- ^^^^^^ expected `f64`, found `f32`
| |
| arguments to this function are incorrect
|
note: function defined here
--> src/main.rs:1:4
|
1 | fn get_half(number: f64) -> f64 {
| ^^^^^^^^ -----------
help: you can convert an `f32` to an `f64`
|
7 | let result = get_half(number.into());
| +++++++
For more information about this error, try `rustc --explain E0308`.
(When we try to call get_half
with an f32
, Rust does at least suggest we can use
into
, which is helpful and makes this code work.)
Our Rust function takes an f64
, so we can only give it an f64
, and nothing else will compile.
This could save us from the kind of subtle bugs and inconsistencies
we saw in our Python code.
But what do we do if our input has one of the other numeric types?
§Using as
to cast types
§It's convenient!
As YARR! points out, we can use as
to cast
our number to f64
, whatever its type.
It compiles, and runs, and we get what we want:
[src/main.rs:8:5] result = 1.0
All is well. We just learned a convenient way to work around Rust's strictness with types by using as
.
§...except maybe also confusing and risky
YARR! also says this:
You do have to be careful to ensure that if you cast to a smaller size value, that you won't overflow anything. The behavior you get is well-defined but may be unexpected.
What's this unexpected behavior?
Well, remember the get_half_floor
function for getting no more than half of available CPU cores?
We port that to Rust, too:
We use u8
for get_half_floor
's input type, which
maxes out at 255, and 255 CPU cores
ought to be enough for anybody.
We compile and run our code.
[src/main.rs:8:5] result = 3
[src/main.rs:11:5] another_result = 3
Great.
But now someone gives us a machine with a Sierra Forest
CPU, which has 288 cores. Our get_half_floor
function still takes a u8
though, so I guess
we'll be limited to half of 255 until we can update our code. That should do for now, though.
We run our code:
[src/main.rs:8:5] result = 16
Wait. We didn't get half of 288, which we knew we wouldn't, but we didn't even get half of 255. That's odd. Why?
[src/main.rs:7:5] number as u8 = 32
That's what that whole "be careful not to overflow" thing with "unexpected behavior" was about.
Rust By Example gives some helpful, uh,
examples, and explains that
when we use as
to cast a number to a u8
, and our number can't fit into a u8
the first 8 least significant bits (LSB) are kept, while the rest towards the most significant bit (MSB) get truncated.
This makes sense if we look at the binary representation:
288 in binary is 100100000
288 as u8 in binary is 100000
288 doesn't fit into u8
, so when we cast it to u8
with as
,
Rust indeed took only the 8 least significant bits (going from right to left),
and discarded the rest. This gave us 100000 in binary, which is 32 in decimal,7
and explains why get_half_floor
gave us 16 when we called it with 288 as u8
.
So, similarly to our earlier Python functions surprising us by giving us float
s when we might've
expected an int
, using as
could lead to subtle bugs and confusing behavior that we might not notice
or find hard to track down in a larger codebase. Systems-y people know about this,
but programmers coming from higher-level languages might find it surprising.
§So what's the alternative to as
?
I have gotten in the habit of using TryFrom
when I need to convert between types and avoid surprising (to me) results. As its docs say:
the TryFrom trait informs the programmer when a type conversion could go bad and lets them decide how to handle it.
With that, we could now change our code to
and get
[src/main.rs:11:5] result = 16
And if we try a value that doesn't fit, like here:
// --snip--
we get
thread 'main' panicked at src/main.rs:9:23:
Couldn't convert 288 to u8: out of range integral type conversion attempted
which tells us something went wrong (and what), instead of surprising us.
In real code, we might want to handle the error some other way instead of just panicking, like
- Logging a warning that our value is out of range of
u8
(perhaps with tracing) - And/or passing
get_half_floor
au8::MAX
instead, if we got something larger than that - And/or ignoring the input as invalid
- Or using
anyhow
to add context and propagate the error to the caller (example), or defining a custom error type with thiserror - And/or using
as
anyway, but at least with a log message that would make it clearer to us (or someone else) that something unexpected happened, and where to look
Or maybe something else! Our strategy for handling the error would depend on what we're ultimately trying to do, and what would be appropriate for our library/application and the user. But we would know something unexpected happened, and we'd explicitly decide how to deal with it.
Using TryFrom
is definitely more verbose, and requires more boilerplate, which makes it less convenient
than as
. But like others have pointed out, it's one of a few relatively uncommon cases where going against Rust's
"default" way of doing something may actually be a better pattern.
Thanks to David Glivar and Nicole Tietz-Sokolskaya
for giving me feedback about earlier drafts of this post,
Charles Eckman for digging up useful context about cgroups
and a proposal to include support for detecting fractional CPUs in Python's standard library,
and swan
for asking a question that led
to an addendum to this post!
§Footnotes
I now know that convenience came with a significant cost — aside from just performance — but that wasn't clear to me then. It's also potentially a topic for a whole separate post.
I don't want to specifically call anyone out on doing this, but I imagine most people who have
spent long enough learning Rust have seen things like "don't use clone
",
"don't use Rc
",
"don't use dyn Trait
", and more of the like.
This is hard, because overusing unidiomatic constructs can indicate deeper problems with design, and solving design problems is hard and hugely depends on the specific context.
Our program may be running in an environment with fractional CPU cores
(like a container or a cloud VM). However,
Python has no built-in way to detect that,
so if you are using
os.cpu_count
or len(os.sched_getaffinity(0))
,
you will always get an int
.
Maybe someone updates the CPU-core-getting code
to parse cgroup
and find the fractional number of CPU cores available to our program, but doesn't call int
before returning the result.
This was not supposed to be a deep dive into the gotchas of Python's division operators, and I am not trying to pick on Python here, because language design is hard and full of never-ending trade-offs. But also, this seems like a bit of a footgun. Even the spec itself is not super easy to follow:
Division of integers yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the ‘floor’ function applied to the result.
This says nothing explicitly about floor division of floats. On top of that,
that bit at the end about the 'floor' function applied to the result can be misleading with floats specifically,
because math.floor
always returns an int
.
This inconsistency has confused people,
too.
Python's float
is "usually"
a C double
, which on "most systems" is a 64-bit
floating-point type (although C does not guarantee that), which is an f64
in Rust (and Rust does guarantee that).
See here if you'd like a refresher on representing integers in binary.