How to enable Rust Ownership paradigm in C++

I think one could add a degree of compile-time introspection and custom sanitisation by introducing custom wrapper classes that track ownership and borrowing.

The code below is a hypothetical sketch, and not a production solution which would need a lot more tooling, e.g. #def out the checks when not sanitising. It uses a very naive lifetime checker to 'count' borrow errors in ints, in this instance during compilation. static_asserts are not possible as the ints are not constexpr, but the values are there and can be interrogated before runtime. I believe this answers your 3 constraints, regardless of whether these are heap allocations, so I'm using a simple int type to demo the idea, rather than a smart pointer.

Try uncommenting the use cases in main() below (run in compiler explorer with -O3 to see boilerplate optimise away), and you'll see the warning counters change.

https://godbolt.org/z/Pj4WMr

// Hypothetical Rust-like owner / borrow wrappers in C++
// This wraps types with data which is compiled away in release
// It is not possible to static_assert, so this uses static ints to count errors.
#include <utility>

// Statics to track errors. Ideally these would be static_asserts
// but they depen on Owner::has_been_moved which changes during compilation.
static int owner_already_moved = 0;
static int owner_use_after_move = 0;
static int owner_already_borrowed = 0;

// This method exists to ensure static errors are reported in compiler explorer
int get_fault_count() {
    return owner_already_moved + owner_use_after_move + owner_already_borrowed;
}

// Storage for ownership of a type T.
// Equivalent to mut usage in Rust
// Disallows move by value, instead ownership must be explicitly moved.
template <typename T>
struct Owner {
    Owner(T v) : value(v) {}
    Owner(Owner<T>& ov) = delete;
    Owner(Owner<T>&& ov) {
        if (ov.has_been_moved) {
            owner_already_moved++;
        }
        value = std::move(ov.value);
        ov.has_been_moved = true;
    }

    T& operator*() {
        if (has_been_moved) {
            owner_use_after_move++;
        }
        return value;
    }

    T value;
    bool has_been_moved{false};
};

// Safely borrow a value of type T
// Implicit constuction from Owner of same type to check borrow is safe
template <typename T>
struct Borrower {
    Borrower(Owner<T>& v) : value(v.value) {
        if (v.has_been_moved) {
            owner_already_borrowed++;
        }
    }

    const T& operator*() const {
        return value;
    }

    T value;
};

// Example of function borrowing a value, can only read const ref
static void use(Borrower<int> v) {
    (void)*v;
}

// Example of function taking ownership of value, can mutate via owner ref
static void use_mut(Owner<int> v) {
    *v = 5;
}

int main() {
    // Rather than just 'int', Owner<int> tracks the lifetime of the value
    Owner<int> x{3};

    // Borrowing value before mutating causes no problems
    use(x);

    // Mutating value passes ownership, has_been_moved set on original x
    use_mut(std::move(x));

    // Uncomment for owner_already_borrowed = 1
    //use(x);

    // Uncomment for owner_already_moved = 1
    //use_mut(std::move(x));

    // Uncomment for another owner_already_borrowed++
    //Borrower<int> y = x;

    // Uncomment for owner_use_after_move = 1;
    //return *x;
}

The use of static counters is obviously not desirable, but it is not possible to use static_assert as owner_already_moved is non-const. The idea is these statics give hints to errors appearing, and in final production code they could be #defed out.

You can't do this with compile-time checks at all. The C++ type system is lacking any way to reason about when an object goes out of scope, is moved, or is destroyed — much less turn this into a type constraint.

What you could do is have a variant of unique_ptr that keeps a counter of how many "borrows" are active at run time. Instead of get() returning a raw pointer, it would return a smart pointer that increments this counter on construction and decrements it on destruction. If the unique_ptr is destroyed while the count is non-zero, at least you know someone somewhere did something wrong.

However, this is not a fool-proof solution. Regardless of how hard you try to prevent it, there will always be ways to get a raw pointer to the underlying object, and then it's game over, since that raw pointer can easily outlive the smart pointer and the unique_ptr. It will even sometimes be necessary to get a raw pointer, to interact with an API that requires raw pointers.

Moreover, ownership is not about pointers. Box/unique_ptr allows you to heap allocate an object, but it changes nothing about ownership, life time, etc. compared to putting the same object on the stack (or inside another object, or anywhere else really). To get the same mileage out of such a system in C++, you'd have to make such "borrow counting" wrappers for all objects everywhere, not just for unique_ptrs. And that is pretty impractical.

So let's revisit the compile time option. The C++ compiler can't help us, but maybe lints can? Theoretically, if you implement the whole life time part of the type system and add annotations to all APIs you use (in addition to your own code), that may work.

But it requires annotations for all functions used in the whole program. Including private helper function of third party libraries. And those for which no source code is available. And for those whose implementation that are too complicated for the linter to understand (from Rust experience, sometimes the reason something is safe are too subtle to express in the static model of lifetimes and it has to be written slightly differently to help the compiler). For the last two, the linter can't verify that the annotation is indeed correct, so you're back to trusting the programmer. Additionally, some APIs (or rather, the conditions for when they are safe) can't really be expressed very well in the lifetime system as Rust uses it.

In other words, a complete and practically useful linter for this this would be substantial original research with the associated risk of failure.

Maybe there is a middle ground that gets 80% of the benefits with 20% of the cost, but since you want a hard guarantee (and honestly, I'd like that too), tough luck. Existing "good practices" in C++ already go a long way to minimizing the risks, by essentially thinking (and documenting) the way a Rust programmer does, just without compiler aid. I'm not sure if there is much improvement over that to be had considering the state of C++ and its ecosystem.

tl;dr Just use Rust ;-)

What follows are some examples of ways people have tried to emulate parts of Rust's ownership paradigm in C++, with limited success:

Lifetime safety: Preventing common dangling. The most thorough and rigorous approach, involving several additions to the language to support the necessary annotations. If the effort is still alive (last commit was in 2019), getting this analysis added to a mainstream compiler is probably the most likely route to "borrow checked" C++. Discussed on IRLO.
Borrowing Trouble: The Difficulties Of A C++ Borrow-Checker
Is it possible to achieve Rust's ownership model with a generic C++ wrapper?
C++Now 2017: Jonathan Müller “Emulating Rust's borrow checker in C++" (video) and associated code, about which the author says, "You're not actually supposed to use that, if you need such a feature, you should use Rust."
Emulating the Rust borrow checker with C++ move-only types and part II (which is actually more like emulating RefCell than the borrow checker, per se)

I believe you can get some of the benefits of Rust by enforcing some strict coding conventions (which is after all what you'd have to do anyway, since there's no way with "template magic" to tell the compiler not to compile code that doesn't use said "magic"). Off the top of my head, the following could get you...well...kind of close, but only for single-threaded applications:

Never use new directly; instead, use make_unique. This goes partway toward ensuring that heap-allocated objects are "owned" in a Rust-like manner.
"Borrowing" should always be represented via reference parameters to function calls. Functions that take a reference should never create any sort of pointer to the refered-to object. (It may in some cases be necessary to use a raw pointer as a paramter instead of a reference, but the same rule should apply.)
- Note that this works for objects on the stack or on the heap; the function shouldn't care.
Transfer of ownership is, of course, represented via R-value references (&&) and/or R-value references to unique_ptrs.

Unfortunately, I can't think of any way to enforce Rust's rule that mutable references can only exist anywhere in the system when there are no other extant references.

Also, for any kind of parallelism, you would need to start dealing with lifetimes, and the only way I can think of to permit cross-thread lifetime management (or cross-process lifetime management using shared memory) would be to implement your own "ptr-with-lifetime" wrapper. This could be implemented using shared_ptr, because here, reference-counting would actually be important; it's still a bit of unnecessary overhead, though, because reference-count blocks actually have two reference counters (one for all the shared_ptrs pointing to the object, another for all the weak_ptrs). It's also a little... odd, because in a shared_ptr scenario, everybody with a shared_ptr has "equal" ownership, whereas in a "borrowing with lifetime" scenario, only one thread/process should actually "own" the memory.

How to enable Rust Ownership paradigm in C++

Tags:

C++

Smart Pointers

Rust

Related

Recent Posts