Why does Rust disallow mutable aliasing?
How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias
flag is not used by the Rust compiler for &mut
references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way. What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? for more information.
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?
Have a read of this series of blog articles about undefined behavior
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.
tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.
Pertinent quote:
The C FAQ defines “undefined behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
The simplest example I know of is trying to push
into a Vec
that's borrowed:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free. push
reallocates the Vec
's heap storage and invalidates our c
reference. Rust doesn't actually know what push
does; all Rust knows is that push
takes &mut self
, and here that violates the aliasing rule.
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this. But if we play around a bit with references and enums, we can express something similar without heap allocation:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int
, stored that pointer in my_enum
, and made my_int_ptr_ptr
point into my_enum
. If we could then reassign my_enum
, we could trash the bits that my_int_ptr_ptr
was pointing to. A double dereference of my_int_ptr_ptr
would be a wild pointer read, which would probably segfault. Luckily, this it another violation of the aliasing rule, and it won't compile:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here
Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?
The compiler is allowed to optimize &mut
pointers under the assumption that they are exclusive (not aliased). Your code breaks this assumption.
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1
and ref_to_i_2
to a function that modifies both and then does something with them:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1
and r2
, because they are not allowed to alias:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!
s always print the same value and take advantage of that fact to further rearrange foo
:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). All else being equal, everybody likes their code to run fast, right?
You might say "Well, that's an invalid optimization because it doesn't do the same thing." But you'd be wrong: the whole point of &mut
references is that they do not alias. There is no way to make r1
and r2
alias without breaking the rules†, which is what makes this optimization valid to do.
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples. But bear in mind that these transformations are part of a long multi-step optimization process. It's important to uphold the properties of &mut
references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem; asking the compiler for occasional exceptions to the &mut
aliasing rule is basically asking it to do your job for you.
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut
. In the example, you can use &Cell<i32>
instead of &mut i32
, as the comments mentioned:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell
provide interior mutability, which is jargon for "disallow certain optimizations because &
references may mutate things". They aren't always quite as convenient as using &mut
, but that's because using them gives you more flexibility to write code like the above.
Also read
- The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races.
- Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations).
† Be aware that using unsafe
by itself does not count as "breaking the rules". &mut
references cannot be aliased, even when using unsafe
, in order for your code to have defined behavior.
A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException
- a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.
This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy
instead of memmove
. Actually, Rust's memcpy
equivalent is unsafe
too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.
A more contrived example of reference aliasing is like this:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.