diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index c580d59e..d97ac321 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -24,7 +24,7 @@ jobs: mdbook build mdbook test - name: Upload Artifact - uses: actions/upload-pages-artifact@v1.0.8 + uses: actions/upload-pages-artifact@v3 with: path: reference/book @@ -45,4 +45,4 @@ jobs: runs-on: ubuntu-latest steps: - id: deployment - uses: actions/deploy-pages@v2.0.0 + uses: actions/deploy-pages@v4 diff --git a/README.md b/README.md index 1d4ecb7e..eb5ce45a 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,7 @@ The purpose of this repository is to collect and discuss all sorts of questions It is primarily used by the [opsem team](https://github.com/rust-lang/opsem-team/) to track open questions around the operational semantics, but we also track some "non-opsem" questions that fall into T-lang or T-type's purview, if they are highly relevant to unsafe code authors. The [Unsafe Code Guidelines Reference "book"][ucg_book] is a past effort to systematize a consensus on some of these questions. -It is not actively maintained any more, but can still be a good source of information and references. -Note however that unless stated otherwise, the information in the guide is mostly a "recommendation" and still subject to change. +Most of it has been archived, but the [glossary](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html) is still a useful resource. Current consensus is documented in [t-opsem FCPs](https://github.com/rust-lang/opsem-team/blob/main/fcps.md) and the [Rust Language Reference]. diff --git a/reference/src/layout/arrays-and-slices.md b/reference/src/layout/arrays-and-slices.md index e6c55f16..42b85a2f 100644 --- a/reference/src/layout/arrays-and-slices.md +++ b/reference/src/layout/arrays-and-slices.md @@ -1,43 +1,7 @@ # Layout of Rust array types and slices -## Layout of Rust array types +**This page has been archived** -Array types, `[T; N]`, store `N` values of type `T` with a _stride_ that is -equal to the size of `T`. Here, _stride_ is the distance between each pair of -consecutive values within the array. +It did not actually reflect current layout guarantees and caused frequent confusion. -The _offset_ of the first array element is `0`, that is, a pointer to the array -and a pointer to its first element both point to the same memory address. - -The _alignment_ of array types is greater or equal to the alignment of its -element type. If the element type is `repr(C)` the layout of the array is -guaranteed to be the same as the layout of a C array with the same element type. - -> **Note**: the type of array arguments in C function signatures, e.g., `void -> foo(T x[N])`, decays to a pointer. That is, these functions do not take arrays -> as an arguments, they take a pointer to the first element of the array -> instead. Array types are therefore _improper C types_ (not C FFI safe) in Rust -> foreign function declarations, e.g., `extern { fn foo(x: [T; N]) -> [U; M]; -> }`. Pointers to arrays are fine: `extern { fn foo(x: *const [T; N]) -> *const -> [U; M]; }`, and `struct`s and `union`s containing arrays are also fine. - -### Arrays of zero-size - -Arrays `[T; N]` have zero size if and only if their count `N` is zero or their -element type `T` is zero-sized. - -### Layout compatibility with packed SIMD vectors - -The [layout of packed SIMD vector types][Vector] [^2] requires the _size_ and -_alignment_ of the vector elements to match. That is, types with [packed SIMD -vector][Vector] layout are layout compatible with arrays having the same element -type and the same number of elements as the vector. - -[^2]: The [packed SIMD vector][Vector] layout is the layout of `repr(simd)` types like [`__m128`]. - -[Vector]: packed-simd-vectors.md -[`__m128`]: https://doc.rust-lang.org/core/arch/x86_64/struct.__m128.html - -## Layout of Rust slices - -The layout of a slice `[T]` of length `N` is the same as that of a `[T; N]` array. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/arrays-and-slices.md). diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 26267bbc..c455f1d2 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -1,411 +1,7 @@ # Layout of Rust `enum` types -**Disclaimer:** Some parts of this section were decided in RFCs, but -others represent the consensus from issue [#10]. The text will attempt -to clarify which parts are "guaranteed" (owing to the RFC decision) -and which parts are still in a "preliminary" state, at least until we -start to open RFCs ratifying parts of the Unsafe Code Guidelines -effort. +**This page has been archived** -**Note:** This document has not yet been updated to -[RFC 2645](https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md). +It did not actually reflect current layout guarantees and caused frequent confusion. -[#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10 - -## Categories of enums - -**Empty enums.** Enums with no variants can never be instantiated and -are equivalent to the `!` type. They do not accept any `#[repr]` -annotations. - -**Fieldless enums.** The simplest form of enum is one where none of -the variants have any fields: - -```rust -enum SomeEnum { - Variant1, - Variant2, - Variant3, -} -``` - -Such enums correspond quite closely with enums in the C language -(though there are important differences as well). Presuming that they -have more than one variant, these sorts of enums are always -represented as a simple integer, though the size will vary. - -Fieldless enums may also specify the value of their discriminants -explicitly: - -```rust -enum SomeEnum { - Variant22 = 22, - Variant44 = 44, - Variant45, -} -``` - -As in C, discriminant values that are not specified are defined as -either 0 (for the first variant) or as one more than the prior -variant. - -**Data-carrying enums.** Enums with at least one variant with fields are called -"data-carrying" enums. Note that for the purposes of this definition, it is not -relevant whether the variant fields are zero-sized. Therefore this enum is -considered "data-carrying": - -```rust -enum Foo { - Bar(()), - Baz, -} -``` - -## repr annotations accepted on enums - -In general, enums may be annotated using the following `#[repr]` tags: - -- A specific integer type (called `Int` as a shorthand below): - - `#[repr(u8)]` - - `#[repr(u16)]` - - `#[repr(u32)]` - - `#[repr(u64)]` - - `#[repr(i8)]` - - `#[repr(i16)]` - - `#[repr(i32)]` - - `#[repr(i64)]` -- C-compatible layout: - - `#[repr(C)]` -- C-compatible layout with a specified discriminant size: - - `#[repr(C, u8)]` - - `#[repr(C, u16)]` - - etc - -Note that manually specifying the alignment using `#[repr(align)]` is -not permitted on an enum. - -The set of repr annotations accepted by an enum depends on its category, -as defined above: - -- Empty enums: no repr annotations are permitted. -- Fieldless enums: `#[repr(Int)]`-style and `#[repr(C)]` annotations are permitted, but `#[repr(C, Int)]` annotations are not. -- Data-carrying enums: all repr annotations are permitted. - -## Enum layout rules - -The rules for enum layout vary depending on the category. - -### Layout of an empty enum - -An **empty enum** is an enum with no variants; empty enums can never -be instantiated and are logically equivalent to the "never type" -`!`. `#[repr]` annotations are not accepted on empty enums. Empty -enums are guaranteed to have the same layout as `!` (zero size and -alignment 1). - -### Layout of a fieldless enum - -If there is no `#[repr]` attached to a fieldless enum, the compiler -will represent it using an integer of sufficient size to store the -discriminants for all possible variants -- note that if there is only -one variant, then 0 bits are required, so it is possible that the enum -may have zero size. In the absence of a `#[repr]` annotation, the -number of bits used by the compiler are not defined and are subject to -change. - -When a `#[repr(Int)]`-style annotation is attached to a fieldless enum -(one without any data for its variants), it will cause the enum to be -represented as a simple integer of the specified size `Int`. This must -be sufficient to store all the required discriminant values. - -The `#[repr(C)]` annotation is equivalent, but it selects the same -size as the C compiler would use for the given target for an -equivalent C-enum declaration. - -Combining a `C` and `Int` `repr` (e.g., `#[repr(C, u8)]`) is -not permitted on a fieldless enum. - -The values used for the discriminant will match up with what is -specified (or automatically assigned) in the enum definition. For -example, the following enum defines the discriminants for its variants -as 22 and 23 respectively: - -```rust -enum Foo { - // Specificy discriminant of this variant as 22: - Variant22 = 22, - - // Default discriminant is one more than the previous, - // so 23 will be assigned. - Variant23 -} -``` - -**Note:** some C compilers offer flags (e.g., `-fshort-enums`) that -change the layout of enums from the default settings that are standard -for the platform. The integer size selected by `#[repr(C)]` is defined -to match the **default** settings for a given target, when no such -flags are supplied. If interop with code that uses other flags is -desired, then one should either specify the sizes of enums manually or -else use an alternate target definition that is tailored to the -compiler flags in use. - -### Layout of a data-carrying enums with an explicit repr annotation - -This section concerns data-carrying enums **with an explicit repr -annotation of some form**. The memory layout of such cases was -specified in [RFC 2195][] and is therefore normative. - -[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html - -The layout of data-carrying enums that do **not** have an explicit -repr annotation is generally undefined, but with certain specific -exceptions: see the next section for details. - -#### Explicit repr annotation without C compatibility - -When an enum is tagged with `#[repr(Int)]` for some integral type -`Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a -series of `#[repr(C)]` structs, one per variant. Each of these structs -begins with an integral field containing the **discriminant**, which -specifies which variant is active. They then contain the remaining -fields associated with that variant. - -**Example.** The following enum uses an `repr(u8)` annotation: - -```rust -#[repr(u8)] -enum TwoCases { - A(u8, u16), - B(u16), -} -``` - -This will be laid out equivalently to the following more -complex Rust types: - -```rust -#[repr(C)] -union TwoCasesRepr { - A: TwoCasesVariantA, - B: TwoCasesVariantB, -} - -# #[derive(Copy, Clone)] -#[repr(u8)] -enum TwoCasesTag { A, B } - -# #[derive(Copy, Clone)] -#[repr(C)] -struct TwoCasesVariantA(TwoCasesTag, u8, u16); - -# #[derive(Copy, Clone)] -#[repr(C)] -struct TwoCasesVariantB(TwoCasesTag, u16); -``` - -Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are -`#[repr(C)]`; this is needed to ensure that the `TwoCasesTag` value -appears at offset 0 in both cases, so that we can read it to determine -the current variant. - -#### Explicit repr annotation with C compatibility - -When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, -u8)]`, the layout of enums is changed to better match C++ enums. In -this mode, the data is laid out as a tuple of `(discriminant, union)`, -where `union` represents a C union of all the possible variants. The -type of the discriminant will be the integral type specified (`u8`, -etc) -- if no type is specified, then the compiler will select one -based on what a size a fieldless enum would have with the same number of -variants. - -This layout, while more compatible and arguably more obvious, is also -less efficient than the non-C compatible layout in some cases in terms -of total size. For example, the `TwoCases` example given in the -previous section only occupies 4 bytes with `#[repr(u8)]`, but would -occupy 6 bytes with `#[repr(C, u8)]`, as more padding is required. - -**Example.** The following enum: - -```rust,ignore -#[repr(C, Int)] -enum MyEnum { - A(u32), - B(f32, u64), - C { x: u32, y: u8 }, - D, -} -``` - -is equivalent to the following Rust definition: - -```rust,ignore -#[repr(C)] -struct MyEnumRepr { - tag: MyEnumTag, - payload: MyEnumPayload, -} - -#[repr(Int)] -enum MyEnumTag { A, B, C, D } - -#[repr(C)] -union MyEnumPayload { - A: u32, - B: MyEnumPayloadB, - C: MyEnumPayloadC, - D: (), -} - -#[repr(C)] -struct MyEnumPayloadB(f32, u64); - -#[repr(C)] -struct MyEnumPayloadC { x: u32, y: u8 } -``` - -This enum can also be represented in C++ as follows: - -```c++ -#include - -enum class MyEnumTag: CppEquivalentOfInt { A, B, C, D }; -struct MyEnumPayloadB { float _0; uint64_t _1; }; -struct MyEnumPayloadC { uint32_t x; uint8_t y; }; - -union MyEnumPayload { - uint32_t A; - MyEnumPayloadB B; - MyEnumPayloadC C; -}; - -struct MyEnum { - MyEnumTag tag; - MyEnumPayload payload; -}; -``` - -### Layout of a data-carrying enums without a repr annotation - -If no explicit `#[repr]` attribute is used, then the layout of a -data-carrying enum is typically **not specified**. However, in certain -select cases, there are **guaranteed layout optimizations** that may -apply, as described below. - -#### Discriminant elision on Option-like enums - -(Meta-note: The content in this section is not fully described by any RFC and is -therefore "non-normative". Parts of it were specified in -[rust-lang/rust#60300]). - -[rust-lang/rust#60300]: https://github.com/rust-lang/rust/pull/60300 - -**Definition.** An **option-like enum** is a 2-variant `enum` where: - -- the `enum` has no explicit `#[repr(...)]`, and -- one variant has a single field, and -- the other variant has no fields (the "unit variant"). - -The simplest example is `Option` itself, where the `Some` variant -has a single field (of type `T`), and the `None` variant has no -fields. But other enums that fit that same template fit. - -**Definition.** The **payload** of an option-like enum is the single -field which it contains; in the case of `Option`, the payload has -type `T`. - -**Definition.** In some cases, the payload type may contain illegal -values, which are called **[niches][niche]**. For example, a value of type `&T` -may never be `NULL`, and hence defines a [niche] consisting of the -bitstring `0`. Similarly, the standard library types [`NonZeroU8`] -and friends may never be zero, and hence also define the value of `0` -as a [niche]. - -[`NonZeroU8`]: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html - -The [niche] values must be disjoint from the values allowed by the validity -invariant. The validity invariant is, as of this writing, the current active -discussion topic in the unsafe code guidelines process. [rust-lang/rust#60300] -specifies that the following types have at least one [niche] (the all-zeros -bit-pattern): - -* `&T` -* `&mut T` -* `extern "C" fn` -* `core::num::NonZero*` -* `core::ptr::NonNull` -* `#[repr(transparent)] struct` around one of the types in this list. - -**Option-like enums where the payload defines at least one [niche] value -are guaranteed to be represented using the same memory layout as their -payload.** This is called **discriminant elision**, as there is no -explicit discriminant value stored anywhere. Instead, [niche] values are -used to represent the unit variant. - -The most common example is that `Option<&u8>` can be represented as an -nullable `&u8` reference -- the `None` variant is then represented -using the [niche] value zero. This is because a valid `&u8` value can -never be zero, so if we see a zero value, we know that this must be -`None` variant. - -**Example.** The type `Option<&u32>` will be represented at runtime as -a nullable pointer. FFI interop often depends on this property. - -**Example.** As `fn` types are non-nullable, the type `Option` will be represented at runtime as a nullable function -pointer (which is therefore equivalent to a C function pointer) . FFI -interop often depends on this property. - -**Example.** The following enum definition is **not** option-like, -as it has two unit variants: - -```rust -enum Enum1 { - Present(T), - Absent1, - Absent2, -} -``` - -**Example.** The following enum definition is **not** option-like, -as it has an explicit `repr` attribute. - -```rust -#[repr(u8)] -enum Enum2 { - Present(T), - Absent1, -} -``` - -[niche]: ../glossary.md#niche - -### Layout of enums with a single variant - -> **NOTE**: the guarantees in this section have not been approved by an RFC process. - -**Data-carrying** enums with a single variant without a `repr()` annotation have -the same layout as the variant field. **Fieldless** enums with a single variant -have the same layout as a unit struct. - -For example, here: - -```rust -struct UnitStruct; -enum FieldlessSingleVariant { FieldlessVariant } - -struct SomeStruct { x: u32 } -enum DataCarryingSingleVariant { - DataCarryingVariant(SomeStruct), -} -``` - -* `FieldSingleVariant` has the same layout as `UnitStruct`, -* `DataCarryingSingleVariant` has the same layout as `SomeStruct`. - -## Unresolved questions - -See [Issue #79.](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/79): - -* Layout of multi-variant enums where only one variant is inhabited. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/enums.md). diff --git a/reference/src/layout/function-pointers.md b/reference/src/layout/function-pointers.md index 08fd549c..64b4d704 100644 --- a/reference/src/layout/function-pointers.md +++ b/reference/src/layout/function-pointers.md @@ -1,129 +1,7 @@ # Representation of Function Pointers -### Terminology +**This page has been archived** -In Rust, a function pointer type, is either `fn(Args...) -> Ret`, -`extern "ABI" fn(Args...) -> Ret`, `unsafe fn(Args...) -> Ret`, or -`unsafe extern "ABI" fn(Args...) -> Ret`. -A function pointer is the address of a function, -and has function pointer type. -The pointer is implicit in the `fn` type, -and they have no lifetime of their own; -therefore, function pointers are assumed to point to -a block of code with static lifetime. -This is not necessarily always true, -since, for example, you can unload a dynamic library. -Therefore, this is _only_ a safety invariant, -not a validity invariant; -as long as one doesn't call a function pointer which points to freed memory, -it is not undefined behavior. +It did not actually reflect current layout guarantees and caused frequent confusion. - -In C, a function pointer type is `Ret (*)(Args...)`, or `Ret ABI (*)(Args...)`, -and values of function pointer type are either a null pointer value, -or the address of a function. - -### Representation - -The ABI and layout of `(unsafe)? (extern "ABI")? fn(Args...) -> Ret` -is exactly that of the corresponding C type -- -the lack of a null value does not change this. -On common platforms, this means that `*const ()` and `fn(Args...) -> Ret` have -the same ABI and layout. This is, in fact, guaranteed by POSIX and Windows. -This means that for the vast majority of platforms, - -```rust -fn go_through_pointer(x: fn()) -> fn() { - let ptr = x as *const (); - unsafe { std::mem::transmute::<*const (), fn()>(ptr) } -} -``` - -is both perfectly safe, and, in fact, required for some APIs -- notably, -`GetProcAddress` on Windows requires you to convert from `void (*)()` to -`void*`, to get the address of a variable; -and the opposite is true of `dlsym`, which requires you to convert from -`void*` to `void (*)()` in order to get the address of functions. -This conversion is _not_ guaranteed by Rust itself, however; -simply the implementation. If the underlying platform allows this conversion, -so will Rust. - -However, null values are not supported by the Rust function pointer types -- -just like references, the expectation is that you use `Option` to create -nullable pointers. `Option Ret>` will have the exact same ABI -as `fn(Args...) -> Ret`, but additionally allows null pointer values. - - -### Use - -Function pointers are mostly useful for talking to C -- in Rust, you would -mostly use `T: Fn()` instead of `fn()`. If talking to a C API, -the same caveats as apply to other FFI code should be followed. -As an example, we shall implement the following C interface in Rust: - -```c -struct Cons { - int data; - struct Cons *next; -}; - -struct Cons *cons(struct Cons *self, int data); - -/* - notes: - - func must be non-null - - thunk may be null, and shall be passed unchanged to func - - self may be null, in which case no iteration is done -*/ - -void iterate(struct Cons const *self, void (*func)(int, void *), void *thunk); -bool for_all(struct Cons const *self, bool (*func)(int, void *), void *thunk); -``` - -```rust -# use std::{ -# ffi::c_void, -# os::raw::c_int, -# }; -# - -#[repr(C)] -pub struct Cons { - data: c_int, - next: Option>, -} - -#[no_mangle] -pub extern "C" fn cons(node: Option>, data: c_int) -> Box { - Box::new(Cons { data, next: node }) -} - -#[no_mangle] -pub unsafe extern "C" fn iterate( - node: Option<&Cons>, - func: unsafe extern "C" fn(i32, *mut c_void), // note - non-nullable - thunk: *mut c_void, // note - this is a thunk, so it's just passed raw -) { - let mut it = node; - while let Some(node) = it { - func(node.data, thunk); - it = node.next.as_ref().map(|x| &**x); - } -} - -#[no_mangle] -pub unsafe extern "C" fn for_all( - node: Option<&Cons>, - func: unsafe extern "C" fn(i32, *mut c_void) -> bool, - thunk: *mut c_void, -) -> bool { - let mut it = node; - while let Some(node) = node { - if !func(node.data, thunk) { - return false; - } - it = node.next.as_ref().map(|x| &**x); - } - true -} -``` +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/function-pointers.md). diff --git a/reference/src/layout/packed-simd-vectors.md b/reference/src/layout/packed-simd-vectors.md index ed7b871b..755f24a1 100644 --- a/reference/src/layout/packed-simd-vectors.md +++ b/reference/src/layout/packed-simd-vectors.md @@ -1,98 +1,7 @@ # Layout of packed SIMD vectors -**Disclaimer:** This chapter represents the consensus from issue -[#38]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -[#38]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/38 +It did not actually reflect current layout guarantees and caused frequent confusion. -Rust currently exposes packed[^1] SIMD vector types like `__m128` to users, but it -does not expose a way for users to construct their own vector types. - -The set of currently-exposed packed SIMD vector types is -_implementation-defined_ and it is currently different for each architecture. - -[^1]: _packed_ denotes that these SIMD vectors have a compile-time fixed size, - distinguishing these from SIMD vector types whose size is only known at - run-time. Rust currently only supports _packed_ SIMD vector types. This is - elaborated further in [RFC2366]. - -[RFC2366]: https://github.com/gnzlbg/rfcs/blob/ppv/text/0000-ppv.md#interaction-with-cray-vectors - -## Packed SIMD vector types - -Packed SIMD vector types are `repr(simd)` homogeneous tuple-structs containing -`N` elements of type `T` where `N` is a power-of-two and the size and alignment -requirements of `T` are equal: - -```rust,ignore -#[repr(simd)] -struct Vector(T_0, ..., T_(N - 1)); -``` - -The set of supported values of `T` and `N` is _implementation-defined_. - -The size of `Vector` is `N * size_of::()` and its alignment is an -_implementation-defined_ function of `T` and `N` greater than or equal to -`align_of::()`. That is: - -```rust,ignore -assert_eq!(size_of::>(), size_of::() * N); -assert!(align_of::>() >= align_of::()); -``` - -That is, two distinct `repr(simd)` vector types that have the same `T` and the -same `N` have the same size and alignment. - -Vector elements are laid out in source field order, enabling random access to -vector elements by reinterpreting the vector as an array: - -```rust,ignore -union U { - vec: Vector, - arr: [T; N] -} - -assert_eq!(size_of::>(), size_of::<[T; N]>()); -assert!(align_of::>() >= align_of::<[T; N]>()); - -unsafe { - let u = U { vec: Vector(t_0, ..., t_(N - 1)) }; - - assert_eq!(u.vec.0, u.arr[0]); - // ... - assert_eq!(u.vec.(N - 1), u.arr[N - 1]); -} -``` - -### Unresolved questions - -* **Blocked**: Should the layout of packed SIMD vectors be the same as that of - homogeneous tuples ? Such that: - - ```rust,ignore - union U { - vec: Vector, - tup: (T_0, ..., T_(N-1)), - } - - assert_eq!(size_of::>(), size_of::<(T_0, ..., T_(N-1))>()); - assert!(align_of::>() >= align_of::<(T_0, ..., T_(N-1))>()); - - unsafe { - let u = U { vec: Vector(t_0, ..., t_(N - 1)) }; - - assert_eq!(u.vec.0, u.tup.0); - // ... - assert_eq!(u.vec.(N - 1), u.tup.(N - 1)); - } - ``` - - This is blocked on the resolution of issue [#36] about the layout of - homogeneous structs and tuples. - - [#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36 - -* `MaybeUninit` does not have the same `repr` as `T`, so - `MaybeUninit>` are not `repr(simd)`, which has performance - consequences and means that `MaybeUninit>` is not C-FFI safe. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/packed-simd-vectors.md). diff --git a/reference/src/layout/pointers.md b/reference/src/layout/pointers.md index b8324ad1..c2b0c02d 100644 --- a/reference/src/layout/pointers.md +++ b/reference/src/layout/pointers.md @@ -1,74 +1,7 @@ # Layout of reference and pointer types -**Disclaimer:** Everything this section says about pointers to dynamically sized -types represents the consensus from issue [#16], but has not been stabilized -through an RFC. As such, this is preliminary information. +**This page has been archived** -[#16]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/16 +It did not actually reflect current layout guarantees and caused frequent confusion. -### Terminology - -Reference types are types of the form `&T`, `&mut T`. - -Raw pointer types are types of the form `*const T` or `*mut T`. - -### Representation - -The alignment of `&T`, `&mut T`, `*const T` and `*mut T` are the same, -and are at least the word size. - -* If `T` is a sized type then the alignment of `&T` is the word size. -* The alignment of `&dyn Trait` is the word size. -* The alignment of `&[T]` is the word size. -* The alignment of `&str` is the word size. -* Alignment in other cases may be more than the word size (e.g., for other dynamically sized types). - -The sizes of `&T`, `&mut T`, `*const T` and `*mut T` are the same, -and are at least one word. - -* If `T` is a sized type then the size of `&T` is one word. -* The size of `&dyn Trait` is two words. -* The size of `&[T]` is two words. -* The size of `&str` is two words. -* Size in other cases may be more than one word (e.g., for other dynamically sized types). - -### Notes - -The layouts of `&T`, `&mut T`, `*const T` and `*mut T` are the same. - -If `T` is sized, references and pointers to `T` have a size and alignment of one -word and have therefore the same layout as C pointers. - -> **warning**: while the layout of references and pointers is compatible with -> the layout of C pointers, references come with a _validity_ invariant that -> does not allow them to be used when they could be `NULL`, unaligned, dangling, -> or, in the case of `&mut T`, aliasing. - -We do not make any guarantees about the layout of -multi-trait objects `&(dyn Trait1 + Trait2)` or references to other dynamically sized types, -other than that they are at least word-aligned, and have size at least one word. - -The layout of `&dyn Trait` when `Trait` is a trait is the same as that of: -```rust -#[repr(C)] -struct DynObject { - data: *const u8, - vtable: *const u8, -} -``` - -> **note**: In the layout of `&mut dyn Trait` the field `data` is of the type `*mut u8`. - -The layout of `&[T]` is the same as that of: -```rust -#[repr(C)] -struct Slice { - ptr: *const T, - len: usize, -} -``` - -> **note**: In the layout of `&mut [T]` the field `ptr` is of the type `*mut T`. - -The layout of `&str` is the same as that of `&[u8]`, and the layout of `&mut str` is -the same as that of `&mut [u8]`. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/pointers.md). diff --git a/reference/src/layout/scalars.md b/reference/src/layout/scalars.md index babb4f46..95e34c54 100644 --- a/reference/src/layout/scalars.md +++ b/reference/src/layout/scalars.md @@ -1,129 +1,7 @@ # Layout of scalar types -**Disclaimer:** This chapter represents the consensus from issue -[#9]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -This documents the memory layout and considerations for `bool`, `char`, floating -point types (`f{32, 64}`), and integral types (`{i,u}{8,16,32,64,128,size}`). -These types are all scalar types, representing a single value, and have no -layout `#[repr()]` flags. +It did not actually reflect current layout guarantees and caused frequent confusion. -[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 - -## `bool` - -Rust's `bool` has the same layout as C17's` _Bool`, that is, its size -and alignment are [implementation-defined][data-layout]. Any `bool` can be -cast into an integer, taking on the values 1 (`true`) or 0 (`false`). - -> **Note**: on all platforms that Rust's currently supports, its size and -> alignment are 1, and its ABI class is `INTEGER` - see [Rust Layout and ABIs]. - -[Rust Layout and ABIs]: https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins - -## `char` - -Rust char is 32-bit wide and represents an [unicode scalar value]. The alignment -of `char` is [implementation-defined][data-layout]. - -[unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value - -> **Note**: Rust `char` type is not layout compatible with C / C++ `char` types. -> The C / C++ `char` types correspond to either Rust's `i8` or `u8` types on all -> currently supported platforms, depending on their signedness. Rust does not -> support C platforms in which C `char` is not 8-bit wide. - -## `isize` and `usize` - -The `isize` and `usize` types are pointer-sized signed and unsigned integers. -They have the same layout as the [pointer types] for which the pointee is -`Sized`, and are layout compatible with C's `uintptr_t` and `intptr_t` types. - -> **Note**: C99 [7.18.2.4](https://port70.net/~nsz/c/c99/n1256.html#7.18.2.4) -> requires `uintptr_t` and `intptr_t` to be at least 16-bit wide. All -> platforms we currently support have a C platform, and as a consequence, -> `isize`/`usize` are at least 16-bit wide for all of them. - -> **Note**: Rust's `usize` and C's `unsigned` types are **not** equivalent. C's -> `unsigned` is at least as large as a short, allowed to have padding bits, etc. -> but it is not necessarily pointer-sized. - -> **Note**: in the current Rust implementation, the layouts of `isize` and -> `usize` determine the following: -> -> * the maximum size of Rust _allocations_ is limited to `isize::MAX`. -> The LLVM `getelementptr` instruction uses signed-integer field offsets. Rust -> calls `getelementptr` with the `inbounds` flag which assumes that field -> offsets do not overflow, -> -> * the maximum number of elements in an array is `usize::MAX` (`[T; N: usize]`). -> Only ZST arrays can probably be this large in practice, non-ZST arrays -> are bound by the maximum size of Rust values, -> -> * the maximum value in bytes by which a pointer can be offseted using -> `ptr.add` or `ptr.offset` is `isize::MAX`. -> -> These limits have not gone through the RFC process and are not guaranteed to -> hold. - -[pointer types]: ./pointers.md - -## Fixed-width integer types - -For all Rust's fixed-width integer types `{i,u}{8,16,32,64,128}` it holds that: - -* these types have no padding bits, -* their size exactly matches their bit-width, -* negative values of signed integer types are represented using 2's complement. - -Furthermore, Rust's signed and unsigned fixed-width integer types -`{i,u}{8,16,32,64}` have the same layout as the C fixed-width integer types from -the `` header `{u,}int{8,16,32,64}_t`. These fixed-width integer types -are therefore safe to use directly in C FFI where the corresponding C -fixed-width integer types are expected. - -The alignment of Rust's `{i,u}128` is _unspecified_ and allowed to change. - -> **Note**: While the C standard does not define fixed-width 128-bit wide -> integer types, many C compilers provide non-standard `__int128` types as a -> language extension. The layout of `{i,u}128` in the current Rust -> implementation does **not** match that of these C types, see -> [rust-lang/#54341](https://github.com/rust-lang/rust/issues/54341). - -### Layout compatibility with C native integer types - -The specification of native C integer types, `char`, `short`, `int`, `long`, -... as well as their `unsigned` variants, guarantees a lower bound on their size, -e.g., `short` is _at least_ 16-bit wide and _at least_ as wide as `char`. - -Their exact sizes are _implementation-defined_. - -Libraries like `libc` use knowledge of this _implementation-defined_ behavior on -each platform to select a layout-compatible Rust fixed-width integer type when -interfacing with native C integer types (e.g. `libc::c_int`). - -> **Note**: Rust does not support C platforms on which the C native integer type -> are not compatible with any of Rust's fixed-width integer type (e.g. because -> of padding-bits, lack of 2's complement, etc.). - -## Fixed-width floating point types - -Rust's `f32` and `f64` single (32-bit) and double (64-bit) precision -floating-point types have [IEEE-754] `binary32` and `binary64` floating-point -layouts, respectively. - -When the platforms' `"math.h"` header defines the `__STDC_IEC_559__` macro, -Rust's floating-point types are safe to use directly in C FFI where the -appropriate C types are expected (`f32` for `float`, `f64` for `double`). - -If the C platform's `"math.h"` header does not define the `__STDC_IEC_559__` -macro, whether using `f32` and `f64` in C FFI is safe or not for which C type is -_implementation-defined_. - -> **Note**: the `libc` crate uses knowledge of each platform's -> _implementation-defined_ behavior to provide portable `libc::c_float` and -> `libc::c_double` types that can be used to safely interface with C via FFI. - -[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 -[data-layout]: https://doc.rust-lang.org/nightly/reference/type-layout.html#primitive-data-layout +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/scalars.md). diff --git a/reference/src/layout/structs-and-tuples.md b/reference/src/layout/structs-and-tuples.md index e94d2c6e..bfaf7356 100644 --- a/reference/src/layout/structs-and-tuples.md +++ b/reference/src/layout/structs-and-tuples.md @@ -1,452 +1,7 @@ # Layout of structs and tuples -**Disclaimer:** This chapter represents the consensus from issues -[#11] and [#12]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -[#11]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11 -[#12]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12 +It did not actually reflect current layout guarantees and caused frequent confusion. -## Tuple types - -In general, an anonymous tuple type `(T1..Tn)` of arity N is laid out -"as if" there were a corresponding tuple struct declared in libcore: - -```rust,ignore -#[repr(Rust)] -struct TupleN(P1..Pn); -``` - -In this case, `(T1..Tn)` would be compatible with `TupleN`. -As discussed below, this generally means that the compiler is **free -to re-order field layout** as it wishes. Thus, if you would like a -guaranteed layout from a tuple, you are generally advised to create a -named struct with a `#[repr(C)]` annotation (see [the section on -structs for more details](#structs)). - -Note that the final element of a tuple (`Pn`) is marked as `?Sized` to -permit unsized tuple coercion -- this is implemented on nightly but is -currently unstable ([tracking issue][#42877]). In the future, we may -extend unsizing to other elements of tuples as well. - -[#42877]: https://github.com/rust-lang/rust/issues/42877 - -### Other notes on tuples - -Some related discussion: - -- [RFC #1582](https://github.com/rust-lang/rfcs/pull/1582) proposed - that tuple structs should have a "nested layout", where - e.g. `(T1, T2, T3)` would in fact be laid out as `(T1, (T2, - T3))`. The purpose of this was to permit variadic matching and so - forth against some suffix of the struct. This RFC was not accepted, - however. This layout requires extra padding and seems somewhat - surprising: it means that the layout of tuples and tuple structs - would diverge significantly from structs with named fields. - - - -## Struct types - -Structs come in two principle varieties: - -```rust,ignore -// Structs with named fields -struct Foo { f1: T1, .., fn: Tn } - -// Tuple structs -struct Foo(T1, .., Tn); -``` - -In terms of their layout, tuple structs can be understood as -equivalent to a named struct with fields named `0..n-1`: - -```rust,ignore -struct Foo { - 0: T1, - ... - n-1: Tn -} -``` - -(In fact, one may use such field names in patterns or in accessor -expressions like `foo.0`.) - -The degrees of freedom the compiler has when computing the layout of an -*inhabited* struct or tuple is to determine the order of the fields, and the -"gaps" (often called *padding*) before, between, and after the fields. The -layout of these fields themselves is already entirely determined by their types, -and since we intend to allow creating references to fields (`&s.f1`), structs do -not have any wiggle-room there. - -This can be visualized as follows: -```text -[ <--> [field 3] <-----> [field 1] <-> [ field 2 ] <--> ] -``` -**Figure 1** (struct-field layout): The `<-...->` and `[ ... ]` denote the differently-sized gaps and fields, respectively. - -Here, the individual fields are blocks of fixed size (determined by the field's -layout). The compiler freely picks an order for the fields to be in (this does -not have to be the order of declaration in the source), and it picks the gaps -between the fields (under some constraints, such as alignment). - -For *uninhabited* structs or tuples like `(i32, !)` that do not have a valid -inhabitant, the compiler has more freedom. After all, no references to fields -can ever be taken. For example, such structs might be zero-sized. - -How exactly the compiler picks order and gaps, as well as other aspects of -layout beyond size and field offset, can be controlled by a `#[repr]` attribute: - -- `#[repr(Rust)]` -- the default. -- `#[repr(C)]` -- request C compatibility -- `#[repr(align(N))]` -- specify the alignment -- `#[repr(packed)]` -- request packed layout where fields are not internally aligned -- `#[repr(transparent)]` -- request that a "wrapper struct" be treated - "as if" it were an instance of its field type when passed as an - argument - -### Default layout ("repr rust") - -With the exception of the guarantees provided below, **the default layout of -structs is not specified.** - -As of this writing, we have not reached a full consensus on what limitations -should exist on possible field struct layouts, so effectively one must assume -that the compiler can select any layout it likes for each struct on each -compilation, and it is not required to select the same layout across two -compilations. This implies that (among other things) two structs with the same -field types may not be laid out in the same way (for example, the hypothetical -struct representing tuples may be laid out differently from user-declared -structs). - -Known things that can influence layout (non-exhaustive): - -- the type of the struct fields and the layout of those types -- compiler settings, including esoteric choices like optimization fuel - -**A note on determinism.** The definition above does not guarantee -determinism between executions of the compiler -- two executions may -select different layouts, even if all inputs are identical. Naturally, -in practice, the compiler aims to produce deterministic output for a -given set of inputs. However, it is difficult to produce a -comprehensive summary of the various factors that may affect the -layout of structs, and so for the time being we have opted for a -conservative definition. - -**Compiler's current behavior.** As of the time of this writing, the -compiler will reorder struct fields to minimize the overall size of -the struct (and in particular to eliminate padding due to alignment -restrictions). - -Layout is presently defined not in terms of a "fully monomorphized" -struct definition but rather in terms of its generic definition along -with a set of substitutions (values for each type parameter; lifetime -parameters do not affect layout). This distinction is important -because of *unsizing* -- if the final field has generic type, the -compiler will not reorder it, to allow for the possibility of -unsizing. E.g., `struct Foo { x: u16, y: u32 }` and `struct Foo { -x: u16, y: T }` where `T = u32` are not guaranteed to be identical. - -#### Zero-sized structs -[zero-sized structs]: #zero-sized-structs - -For `repr(Rust)`, `repr(packed(N))`, `repr(align(N))`, and `repr(C)` structs: if -all fields of a struct have size 0, then the struct has size 0. - -For example, all these types are zero-sized: - -```rust -# use std::mem::size_of; -#[repr(align(32))] struct Zst0; -#[repr(C)] struct Zst1(Zst0); -struct Zst2(Zst1, Zst0); -# fn main() { -# assert_eq!(size_of::(), 0); -# assert_eq!(size_of::(), 0); -# assert_eq!(size_of::(), 0); -# } -``` - -In particular, a struct with no fields is a ZST, and if it has no repr attribute -it is moreover a 1-ZST as it also has no alignment requirements. - -#### Single-field structs -[single-field structs]: #single-field-structs - -A struct with only one field has the same layout as that field. - -#### Structs with 1-ZST fields - -For the purposes of struct layout [1-ZST] fields are ignored. - -In particular, if all but one field are 1-ZST, then the struct is equivalent to -a [single-field struct][single-field structs]. In other words, if all but one -field is a 1-ZST, then the entire struct has the same layout as that one field. - -Similarly, if all fields are 1-ZST, then the struct has the same layout as a -[struct with no fields][zero-sized structs], and is itself a 1-ZST. - -For example: - -```rust -type Zst1 = (); -struct S1(i32, Zst1); // same layout as i32 - -type Zst2 = [u16; 0]; -struct S2(Zst2, Zst1); // same layout as Zst2 - -struct S3(Zst1); // same layout as Zst1 -``` - -#### Unresolved questions - -During the course of the discussion in [#11] and [#12], various -suggestions arose to limit the compiler's flexibility. These questions -are currently considering **unresolved** and -- for each of them -- an -issue has been opened for further discussion on the repository. This -section documents the questions and gives a few light details, but the -reader is referred to the issues for further discussion. - -**Homogeneous structs ([#36]).** If you have homogeneous structs, where all -the `N` fields are of a single type `T`, can we guarantee a mapping to -the memory layout of `[T; N]`? How do we map between the field names -and the indices? What about zero-sized types? - -[#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36 - -**Deterministic layout ([#35]).** Can we say that layout is some deterministic -function of a certain, fixed set of inputs? This would allow you to be -sure that if you do not alter those inputs, your struct layout would -not change, even if it meant that you can't predict precisely what it -will be. For example, we might say that struct layout is a function of -the struct's generic types and its substitutions, full stop -- this -would imply that any two structs with the same definition are laid out -the same. This might interfere with our ability to do profile-guided -layout or to analyze how a struct is used and optimize based on -that. Some would call that a feature. - -[#35]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/35 - -### C-compatible layout ("repr C") - -For structs tagged `#[repr(C)]`, the compiler will apply a C-like -layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for -a detailed write-up of what such rules entail (as well as the relevant -specs for your platform). For most platforms, however, this means the -following: - -[C17]: https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf - -- Field order is preserved. -- The first field begins at offset 0. -- Assuming the struct is not packed, each field's offset is aligned[^aligned] to - the ABI-mandated alignment for that field's type, possibly creating - unused padding bits. -- The total size of the struct is rounded up to its overall alignment. - -[^aligned]: Aligning an offset O to an alignment A means to round up the offset O until it is a multiple of the alignment A. - -The intention is that if one has a set of C struct declarations and a -corresponding set of Rust struct declarations, all of which are tagged -with `#[repr(C)]`, then the layout of those structs will all be -identical. Note that this setup implies that none of the structs in -question can contain any `#[repr(Rust)]` structs (or Rust tuples), as -those would have no corresponding C struct declaration -- as -`#[repr(Rust)]` types have undefined layout, you cannot safely declare -their layout in a C program. - -See also the notes on [ABI compatibility](#fnabi) under the section on `#[repr(transparent)]`. - -**Structs with no fields.** One area where Rust layout can deviate -from C/C++ -- even with `#[repr(C)]` -- comes about with "empty -structs" that have no fields. In C, an empty struct declaration like -`struct Foo { }` is illegal. However, both gcc and clang support -options to enable such structs, and [assign them size -zero](https://godbolt.org/z/AS2gdC). Rust behaves the same way -- -empty structs have size 0 and alignment 1 (unless an explicit -`#[repr(align)]` is present). C++, in contrast, gives empty structs a -size of 1, unless they are inherited from or they are fields that have -the `[[no_unique_address]]` attribute, in which case they do not -increase the overall size of the struct. - -**Structs of zero-size.** It is also possible to have structs that -have fields but still have zero size. In this case, the size of the -struct would be zero, but its alignment may be greater. For example, -`#[repr(C)] struct Foo { x: [u16; 0] }` would have an alignment of 2 -bytes by default. ([This matches the behavior in gcc and -clang](https://godbolt.org/z/5w0gkq).) - -**Structs with fields of zero-size.** If a `#[repr(C)]` struct -containing a field of zero-size, that field does not occupy space in -the struct; it can affect the offsets of subsequent fields if it -induces padding due to the alignment on its type. ([This matches the -behavior in gcc and clang](https://godbolt.org/z/5w0gkq).) - -**C++ compatibility hazard.** As noted above when discussing structs -with no fields, C++ treats empty structs like `struct Foo { }` -differently from C and Rust. This can introduce subtle compatibility -hazards. If you have an empty struct in your C++ code and you make the -"naive" translation into Rust, even tagging with `#[repr(C)]` will not -produce layout- or ABI-compatible results. - -### Fixed alignment - -The `#[repr(align(N))]` attribute may be used to raise the alignment -of a struct, as described in [The Rust Reference][TRR-align]. - -[TRR-align]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-align-representation - -### Packed layout - -The `#[repr(packed(N))]` attribute may be used to impose a maximum -limit on the alignments for individual fields. It is most commonly -used with an alignment of 1, which makes the struct as small as -possible. For example, in a `#[repr(packed(2))]` struct, a `u8` or -`u16` would be aligned at 1- or 2-bytes respectively (as normal), but -a `u32` would be aligned at only 2 bytes instead of 4. In the absence -of an explicit `#[repr(align)]` directive, `#[repr(packed(N))]` also -sets the alignment for the struct as a whole to N bytes. - -The resulting fields may not fall at properly aligned boundaries in -memory. This makes it unsafe to create a Rust reference (`&T` or `&mut -T`) to those fields, as the compiler requires that all reference -values must always be aligned (so that it can use more efficient -load/store instructions at runtime). See the [Rust reference for more -details][TRR-packed]. - -[TRR-packed]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-packed-representation - - - -### Function call ABI compatibility - -In general, when invoking functions that use the C ABI, `#[repr(C)]` -structs are guaranteed to be passed in the same way as their -corresponding C counterpart (presuming one exists). `#[repr(Rust)]` -structs have no such guarantee. This means that if you have an `extern -"C"` function, you cannot pass a `#[repr(Rust)]` struct as one of its -arguments. Instead, one would typically pass `#[repr(C)]` structs (or -possibly pointers to Rust-structs, if those structs are opaque on the -other side, or the callee is defined in Rust). - -However, there is a subtle point about C ABIs: in some C ABIs, passing -a struct with one field of type `T` as an argument is **not** -equivalent to just passing a value of type `T`. So e.g. if you have a -C function that is defined to take a `uint32_t`: - -```C -void some_function(uint32_t value) { .. } -``` - -It is **incorrect** to pass in a struct as that value, even if that -struct is `#[repr(C)`] and has only one field: - -```rust,ignore -#[repr(C)] -struct Foo { x: u32 } - -extern "C" some_function(Foo); - -some_function(Foo { x: 22 }); // Bad! -``` - -Instead, you should declare the struct with `#[repr(transparent)]`, -which specifies that `Foo` should use the ABI rules for its field -type, `u32`. This is useful when using "wrapper structs" in Rust to -give stronger typing guarantees. - -`#[repr(transparent)]` can only be applied to structs with a single -field whose type `T` has non-zero size, along with some number of -other fields whose types are all zero-sized (typically -`std::marker::PhantomData` fields). The struct then takes on the "ABI -behavior" of the type `T` that has non-zero size. - -(Note further that the Rust ABI is undefined and theoretically may -vary from compiler revision to compiler revision.) - -## Unresolved question: Guaranteeing compatible layouts? - -One key unresolved question was whether we would want to guarantee -that two `#[repr(Rust)]` structs whose fields have the same types are -laid out in a "compatible" way, such that one could be transmuted to -the other. @rkruppe laid out a [number of -examples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-419956939) -where this might be a reasonable thing to expect. As currently -written, and in an effort to be conservative, we make no such -guarantee, though we do not firmly rule out doing such a thing in the future. - -It seems like it may well be desirable to -- at minimum -- guarantee -that `#[repr(Rust)]` layout is "some deterministic function of the -struct declaration and the monomorphized types of its fields". Note -that it is not sufficient to consider the monomorphized type of a -struct's fields: due to unsizing coercions, it matters whether the -struct is declared in a generic way or not, since the "unsized" field -must presently be [laid out last in the -structure](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-417843595). (Note -that tuples are always coercible (see [#42877] for more information), -and are always declared as generics.) This implies that our -"deterministic function" also takes as input the form in which the -fields are declared in the struct. - -However, that rule is not true today. For example, the compiler -includes an option (called "optimization fuel") that will enable us to -alter the layout of only the "first N" structs declared in the -source. When one is accidentally relying on the layout of a structure, -this can be used to track down the struct that is causing the problem. - -[#42877]: https://github.com/rust-lang/rust/issues/42877 -[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015 - -There are also benefits to having fewer guarantees. For example: - -- Code hardening tools can be used to randomize the layout of individual structs. -- Profile-guided optimization might analyze how instances of a -particular struct are used and tweak the layout (e.g., to insert -padding and reduce false sharing). - - However, there aren't many tools that do this sort of thing -([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420650851), -[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420681763)). Moreover, -it would probably be better for the tools to merely recommend -annotations that could be added -([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105), -[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105)), -such that the knowledge of the improved layouts can be recorded in the -source. - -As a more declarative alternative, @alercah [proposed a possible -extension](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-420165155) -that would permit one to declare that the layout of two structs or -types are compatible (e.g., `#[repr(as(Foo))] struct Bar { .. }`), -thus permitting safe transmutes (and also ABI compatibility). One -might also use some weaker form of `#[repr(C)]` to specify a "more -deterministic" layout. These areas need future exploration. - -## Counteropinions and other notes - -@joshtripplet [argued against reordering struct -fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-417953576), -suggesting instead it would be better if users reordering fields -themselves. However, there are a number of downsides to such a -proposal (and -- further -- it does not match our existing behavior): - -- In a generic struct, the [best ordering of fields may not be known - ahead of - time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840), - so the user cannot do it manually. -- If layout is defined, and a library exposes a struct with all public - fields, then clients may be more likely to assume that the layout of - that struct is stable. If they were to write unsafe code that relied - on this assumption, that would break if fields were reordered. But - libraries may well expect the freedom to reorder fields. This case - is weakened because of the requirement to write unsafe code (after - all, one can always write unsafe code that relies on virtually any - implementation detail); if we were to permit **safe** casts that - rely on the layout, then reordering fields would clearly be a - breaking change (see also [this - comment](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420117856) - and [this - thread](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/31#discussion_r224955817)). -- Many people would prefer the name ordering to be chosen for - "readability" and not optimal layout. - -[1-ZST]: ../glossary.md#zero-sized-type--zst +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/structs-and-tuples.md). diff --git a/reference/src/layout/unions.md b/reference/src/layout/unions.md index 42d55ee7..f4d74150 100644 --- a/reference/src/layout/unions.md +++ b/reference/src/layout/unions.md @@ -1,178 +1,7 @@ # Layout of unions -**Disclaimer:** This chapter represents the consensus from issue -[#13]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -**Note:** This document has not yet been updated to -[RFC 2645](https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md). +It did not actually reflect current layout guarantees and caused frequent confusion. -[#13]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13 - -### Layout of individual union fields - -A union consists of several variants, one for each field. All variants have the -same size and start at the same memory address, such that in memory the variants -overlap. This can be visualized as follows: - -```text -[ <--> [field0_ty] <----> ] -[ <----> [field1_ty] <--> ] -[ <---> [field2_ty] <---> ] -``` -**Figure 1** (union-field layout): Each row in the picture shows the layout of -the union for each of its variants. The `<-...->` and `[ ... ]` denote the -differently-sized gaps and fields, respectively. - -The individual fields (`[field{i}_ty_]`) are blocks of fixed size determined by -the field's [layout]. Since we allow creating references to union fields -(`&u.i`), the only degrees of freedom the compiler has when computing the layout -of a union are the size of the union, which can be larger than the size of its -largest field, and the offset of each union field within its variant. How these -are picked depends on certain constraints like, for example, the alignment -requirements of the fields, the `#[repr]` attribute of the `union`, etc. - -[padding]: ../glossary.md#padding -[layout]: ../glossary.md#layout - -### Unions with default layout ("`repr(Rust)`") - -Except for the guarantees provided below for some specific cases, the default -layout of Rust unions is, _in general_, **unspecified**. - -That is, there are no _general_ guarantees about the offset of the fields, -whether all fields have the same offset, what the call ABI of the union is, etc. - -
Rationale - -As of this writing, we want to keep the option of using non-zero offsets open -for the future; whether this is useful depends on what exactly the -compiler-assumed invariants about union contents are. This might become clearer -after the [validity of unions][#73] is settled. - -Even if the offsets happen to be all 0, there might still be differences in the -function call ABI. If you need to pass unions by-value across an FFI boundary, -you have to use `#[repr(C)]`. - -[#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 - -
- -#### Layout of unions with a single non-zero-sized field - -The layout of unions with a single non-[1-ZST]-field" is the same as the -layout of that field if it has no [padding] bytes. - -For example, here: - -```rust -# use std::mem::{size_of, align_of}; -# #[derive(Copy, Clone)] -#[repr(transparent)] -struct SomeStruct(i32); -# #[derive(Copy, Clone)] -struct Zst; -union U0 { - f0: SomeStruct, - f1: Zst, -} -# fn main() { -# assert_eq!(size_of::(), size_of::()); -# assert_eq!(align_of::(), align_of::()); -# } -``` - -the union `U0` has the same layout as `SomeStruct`, because `SomeStruct` has no -padding bits - it is equivalent to an `i32` due to `repr(transparent)` - and -because `Zst` is a [1-ZST]. - -On the other hand, here: - -```rust -# use std::mem::{size_of, align_of}; -# #[derive(Copy, Clone)] -struct SomeOtherStruct(i32); -# #[derive(Copy, Clone)] -#[repr(align(16))] struct Zst2; -union U1 { - f0: SomeOtherStruct, - f1: Zst2, -} -# fn main() { -# assert_eq!(size_of::(), align_of::()); -# assert_eq!(align_of::(), align_of::()); -assert_eq!(align_of::(), 16); -# } -``` - -the layout of `U1` is **unspecified** because: - -* `Zst2` is not a [1-ZST], and -* `SomeOtherStruct` has an unspecified layout and could contain padding bytes. - -### C-compatible layout ("repr C") - -The layout of `repr(C)` unions follows the C layout scheme. Per sections -[6.5.8.5] and [6.7.2.1.16] of the C11 specification, this means that the offset -of every field is 0. Unsafe code can cast a pointer to the union to a field type -to obtain a pointer to any field, and vice versa. - -[6.5.8.5]: http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5 -[6.7.2.1.16]: http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p16 - -#### Padding - -Since all fields are at offset 0, `repr(C)` unions do not have padding before -their fields. They can, however, have padding in each union variant *after* the -field, to make all variants have the same size. - -Moreover, the entire union can have trailing padding, to make sure the size is a -multiple of the alignment: - -```rust -# use std::mem::{size_of, align_of}; -#[repr(C, align(2))] -union U { x: u8 } -# fn main() { -// The repr(align) attribute raises the alignment requirement of U to 2 -assert_eq!(align_of::(), 2); -// This introduces trailing padding, raising the union size to 2 -assert_eq!(size_of::(), 2); -# } -``` - -> **Note**: Fields are overlapped instead of laid out sequentially, so -> unlike structs there is no "between the fields" that could be filled -> with padding. - -#### Zero-sized fields - -`repr(C)` union fields of zero-size are handled in the same way as in struct -fields, matching the behavior of GCC and Clang for unions in C when zero-sized -types are allowed via their language extensions. - -That is, these fields occupy zero-size and participate in the layout computation -of the union as usual: - -```rust -# use std::mem::{size_of, align_of}; -#[repr(C)] -union U { - x: u8, - y: [u16; 0], -} -# fn main() { -// The zero-sized type [u16; 0] raises the alignment requirement to 2 -assert_eq!(align_of::(), 2); -// This in turn introduces trailing padding, raising the union size to 2 -assert_eq!(size_of::(), 2); -# } -``` - -**C++ compatibility hazard**: C++ does, in general, give a size of 1 to types -with no fields. When such types are used as a union field in C++, a "naive" -translation of that code into Rust will not produce a compatible result. Refer -to the [struct chapter](structs-and-tuples.md#c-compatible-layout-repr-c) for -further details. - -[1-ZST]: ../glossary.md#zero-sized-type--zst +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/unions.md). diff --git a/reference/src/optimizations/return_value_optimization.md b/reference/src/optimizations/return_value_optimization.md index 4f5b11ec..de3e0562 100644 --- a/reference/src/optimizations/return_value_optimization.md +++ b/reference/src/optimizations/return_value_optimization.md @@ -1,20 +1,5 @@ -We should turn +**This page has been archived** -```rust,ignore -// y unused -let mut x = f(); -g(&mut x); -y = x; -// x unused -``` +It did not actually reflect current language guarantees and caused frequent confusion. -into - -```rust,ignore -y = f(); -g(&mut y); -``` - -to avoid a copy. - -The potential issue here is `g` storing the pointer it got as an argument elsewhere. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/optimizations/return_value_optimization.md). diff --git a/reference/src/validity/function-pointers.md b/reference/src/validity/function-pointers.md index 11389b15..a156249c 100644 --- a/reference/src/validity/function-pointers.md +++ b/reference/src/validity/function-pointers.md @@ -1,25 +1,7 @@ # Validity of function pointers -**Disclaimer**: This chapter is a work-in-progress. What's contained here -represents the consensus from issue [#72]. The statements in here are not (yet) -"guaranteed" not to change until an RFC ratifies them. +**This page has been archived** -A function pointer is "valid" (in the sense that it can be produced without causing immediate UB) if and only if it is non-null. +It did not actually reflect current language guarantees and caused frequent confusion. -That makes this code UB: - -```rust -fn bad() { - let x: fn() = unsafe { std::mem::transmute(0usize) }; // This is UB! -} -``` - -However, any integer value other than NULL is allowed for function pointers: - -```rust -fn good() { - let x: fn() = unsafe { std::mem::transmute(1usize) }; // This is not UB. -} -``` - -[#72]: https://github.com/rust-lang/unsafe-code-guidelines/issues/72 +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/function-pointers.md). diff --git a/reference/src/validity/unions.md b/reference/src/validity/unions.md index bc732ff0..ec610c74 100644 --- a/reference/src/validity/unions.md +++ b/reference/src/validity/unions.md @@ -1,13 +1,7 @@ # Validity of unions -**Disclaimer**: This chapter is a work-in-progress. What's contained here -represents the consensus from issue [#73]. The statements in here are not (yet) -"guaranteed" not to change until an RFC ratifies them. +**This page has been archived** -## Validity of unions with zero-sized fields +It did not actually reflect current language guarantees and caused frequent confusion. -A union containing a zero-sized field can contain any bit pattern. An example of such -a union is [`MaybeUninit`]. - -[#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 -[`MaybeUninit`]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/unions.md). diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index 1127bdca..158c96cd 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -18,7 +18,10 @@ We should evaluate whether there truly is some use-case here that is not current It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes, see the [discussion here](https://github.com/rust-lang/unsafe-code-guidelines/issues/449).
The alternative is to not use the "fast path" for problematic types (and fall back to the SeqLock), but that requires some way to query at `const`-time whether the type contains padding (or provenance). - (Or of course one can use inline assembly, but it would be better if that was not required.) + (Or of course one can use inline assembly, but it would be better if that was not required. This is in fact what crossbeam now does, via [atomic-maybe-uninit](https://github.com/taiki-e/atomic-maybe-uninit).) +* crossbeam's deque uses [volatile accesses that really should be atomic instead](https://github.com/crossbeam-rs/crossbeam/blob/5a154def002304814d50f3c7658bd30eb46b2fad/crossbeam-deque/src/deque.rs#L70-L88). + They cannot use atomic accesses as those are not possible for arbitrary `T`. + This would be resolved by bytewise atomic memcpy. ### Cases related to aliasing @@ -34,6 +37,12 @@ We should evaluate whether there truly is some use-case here that is not current There is a bunch of code out there that violates these rules one way or another. All of these are resolved by [Tree Borrows](https://perso.crans.org/vanille/treebor/), though [some subtleties around `as_mut_ptr` do remain](https://github.com/rust-lang/unsafe-code-guidelines/issues/450). +### Other cases + +* `gxhash` wants to do a vector-sized load that may go out-of-bounds, and didn't find a better solution than causing UB with an OOB load and then masking off the extra bytes. + See [here](https://github.com/ogxd/gxhash/issues/82) for some discussion and [here](https://github.com/ogxd/gxhash/blob/9eb19b021ff94a7b37beb5f479880d07e029b933/src/gxhash/platform/mod.rs#L18) for the relevant code. + The same [also happens in `compiler-builtins`](https://github.com/rust-lang/compiler-builtins/issues/559). + ## Former cases of deliberate UB that have at least a work-in-progress solution to them * Various `offset_of` implementations caused UB by using `mem::uninitialized()`, or they used `&(*base).field` or `addr_of!((*base).field)` to project a dummy pointer to the field which is UB due to out-of-bounds pointer arithmetic. diff --git a/resources/llvm-assumptions.md b/resources/llvm-assumptions.md new file mode 100644 index 00000000..ab974d49 --- /dev/null +++ b/resources/llvm-assumptions.md @@ -0,0 +1,14 @@ +Some of the things we want people to do with Rust can currently not be expressed in LLVM in a way that is fully backed by the LLVM LangRef. +Let's collect a list of those cases here. + +## List of LLVM assumptions not backed by the spec + +- To implement `ptr.addr()`, we assume that a pointer-to-int transmute yields the address. + The LangRef is quiet about this (as it is about almost everything related to provenance). + Alive says that this yields poison. +- To implement the desired semantics for `MaybeUninit<$int>` we need a type of arbitrary size that can hold arbitrary data -- including provenance. + LLVM currently has no such type, the only type that is fully guaranteed to support provenance is `ptr` and that has a fixed size. + [LLVM issue](https://github.com/llvm/llvm-project/issues/142141) +- This one is not about current Rust but about possible future extensions: + when LLVM returns `poison` for some operation, we can *not* say that this corresponds to `uninit` in Rust. We *must* declare this immediate UB. + The reason for this is that LLVM does not really support `poison` being stored in memory; Rust's `uninit` can therefore only correspond to LLVM's `undef`. diff --git a/wip/stacked-borrows.md b/wip/stacked-borrows.md index 34b8f1e9..6af17afc 100644 --- a/wip/stacked-borrows.md +++ b/wip/stacked-borrows.md @@ -373,3 +373,28 @@ libstd needed/needs some patches to comply with this model. These provide a good * [`LinkedList` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/60072) * [`VecDeque` invalidates a protected shared reference](https://github.com/rust-lang/rust/issues/60076) +## Biggest conceptual issues + +The two biggest conceptual issues with this model are the following: + +- Raw pointer casts generate fresh tags. + This is a problem because it means we need to detect the *transitions* from references to raw pointers, which is not always easy. + (In contrast, for all other retags we can just retag whenever we see a reference, no matter where it comes from.) + It is also frequently surprising to programmers, e.g. when `addr_of_mut!(local)` is invalidated by direct writes to `local`. + Finally it leads to the raw pointer type at the moment of transition being significant, which again defies the usual intuition and general goal of raw pointers that their type is not semantically relevant. +- On reads we do not follow a proper stack discipline. + Instead, we just disable all `Unique` above the item that granted the read access. + This is obviously ugly, but more importantly it means that the first issue cannot be easily fixed: + if raw pointer casts just retained the original tag, then a raw pointer derived from `&mut` would become invalidated when the `&mut` becomes invalidated, and that just breaks way too much code. + (Currently, the raw pointer instead becomes a `SharedReadWrite` on top of the `Unique`, so the `Unique` can be invalidated while the raw pointer remains usable.) + +The best known alternative for the second point is to go the Tree Borrows route of *freezing* all `Unique` (and their children) above a read-granting item. +This basically means we would be popping such `Unique` (and everything above them) *only for writes* but not for reads---much nicer than the current situation. +However, this breaks *tons* of code that looks like this: +```rust +ptr::copy_nonoverlapping(src.as_ptr(), dest.as_mut_ptr(), dest.len()); +``` +Here `dest` is a slice. +The issue is that calling `dest.len()` *after* `dest.as_mut_ptr()` does a shared-read-only reborrow of `dest`, which freezes the raw pointer returned by `as_mut_ptr` and thus makes later writes to it UB. + +It's not clear how this could be fixed without going all the way to [trees](https://perso.crans.org/vanille/treebor/).