10000 Nova docu update by dasteihn · Pull Request #45 · Rust-for-Linux/rust-for-linux.com · GitHub
[go: up one dir, main page]

Skip to content

Nova docu update #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 79 additions & 35 deletions src/Nova-GPU-Driver.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,106 @@
# Nova GPU Driver

Nova is a driver for GSP-based Nvidia GPUs that is currently under development
and is being written in Rust.
Nova is a driver for GSP (GPU system processor) based Nvidia GPUs. It is
intended to become the successor of Nouveau as the mainline driver for Nvidia
(GSP) GPUs in Linux.

Currently, the objective is to upstream Rust abstractions for the relevant
subsystems a prerequisite for the actual driver. Hence, the first mainline
version of Nova will be a stub driver which helps establishing the necessary
infrastructure in other subsystems (notably PCI and DRM).
It will support all Nvidia GPUs beginning with the GeForce RTX20 (Turing family)
series and newer.

## Contact

To contact the team and / or participate in development, please use the mailing
list: nouveau@lists.freedesktop.org
Available communication channels are:

- The mailing list: nouveau@lists.freedesktop.org
- IRC: #nouveau on OFTC
- [Zulip Chat](https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova)


## Resources

- [Official Source Tree](https://gitlab.freedesktop.org/drm/nova)
- [Announcement E-Mail](https://lore.kernel.org/dri-devel/Zfsj0_tb-0-tNrJy@cassiopeiae/)
The parts that are already in mainline Linux can be found in
`drivers/gpu/nova-core/` and `drivers/gpu/drm/nova/`

The development repository for the in-tree driver is located on
[Freedesktop](https://gitlab.freedesktop.org/drm/nova).


## Background

### Why a new driver?

Nouveau was, for the most part, designed for pre-GSP hardware. The driver exists
since ~2009 and its authors back in the day had to reverse engineer a lot about
the hardware's internals, resulting in a relatively difficult to maintain
codebase.

Moreover, Nouveau's maintainers concluded that a new driver, exclusively for
GSP hardware, would allow for significantly simplifying the driver design: Most
of the hardware internals that Nouveau had to reverse engineer reside in the
GSP firmware. Hereby, the GSP takes up the role of a hardware abstraction layer
which communicates with the host kernel through IPC. Thereby, a lot of the
stack's complexity is moved from the GPU driver into the GSP firmware.

This, in consequence, enables better maintainability. Another chance with a new
driver is to obtain active community participation from the very beginning.


In the source tree, the driver lives in `drivers/gpu/drm/nova`.
### Why write it in Rust?

Rust's most desired feature are its guarantees for memory safety, notably the
elimination of Use-after-Free errors. Those are errors GPU drivers suffer from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically included in the class of problems you describe, but I would also mention that Rust's powerful type system allows us to encode a lot of logic that subsequently can be evaluated at compile time rather than runtime.

A prominent example are lifetime rules, which can be greatly evaluated at compile time, where DRM drivers in C have to enforce them by convention, which given the high complexity of DRM drivers, often leads to (memory) bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we have to list all of Rust's features here, do we.

As I'm not sure what precisely you're talking about here when you're referencing the complicated ("powerful") type system, I'd ask you to provide a sentence that you see fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides Rust's built-in ownership and lifetime model, its powerful type system
allows us to avoid a large portion of a whole class of bugs (i.e. memory safety
bugs).

Additionally, the same features allow us to model APIs in a way that also
certain logic errors can be caught at compile time already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to update this section, can you please put what I came up with at the beginning of "### Why write it in Rust?"?

significantly, because GPUs are, by definition, asynchronous in regards to the
CPU and can handle a great many jobs (i.e., memory objects) simultaneously.
Jobs' status can be changed at different places in the code base at different
points in time (through work items, interrupt handlers, userspace calls, ...).

## Status
In short, GPU drivers were expected to profit the most from the promised memory
safety.

Currently, Nova is just a stub driver intended to lift the bindings necessary
for a real GPU driver into the (mainline) kernel.
Since Nova is a freshly written new driver, it was an opportunity to try to
leverage the advantages of Rust and obtain a more reliable, maintainable driver.

Currently, those efforts are mostly focused on getting bindings for PCI, DRM
and the Device (driver) model upstream.
Besides Rust's built-in ownership and lifetime model, its powerful type system
allows us to avoid a large portion of a whole class of bugs (i.e. memory safety
bugs).

It can be expected that, as the driver continues to grow, various other abstractions
will be needed.
Additionally, the same features allow us to model APIs in a way that also
certain logic errors can be caught at compile time already.

## Architecture

## Utilized Common Rust Infrastructure
![Nova Architecture with vGPUs](./nova-core-vm.png)

Nova depends on the Rust for Linux `staging/*` [branches](Branches.md).
The overall GPU driver is split into two parts:

1. "Nova-Core", living in `drivers/gpu/nova-core/`. Nova-Core implements
the fundamental interaction with the hardware (through PCI etc.) and,
notably, boots up the GSP and interacts with it through a command queue.
2. "Nova-DRM" (the official name is actually just "Nova", but to avoid
confusion developers usually call it "Nova-DRM"), living in
`drivers/gpu/drm/nova/`. This is the actual graphics driver,
implementing all the typical DRM interfaces for userspace.

## Contributing
This split architecture allows for virtualizing GPUs: Nova-Core can be used to
instruct the GPU's firmware to spawn a new PCI virtual function (Through
[SR-IOV](https://docs.kernel.org/PCI/pci-iov-howto.html)), thus
creating new PCI virtual functions), which can then be passed to a virtual
machine, which then, for example, can run another Linux with another Nova-Core
bound to the virtual GPU. Then, on top, Nova-DRM can be utilized as a
conventional GPU driver to use the vGPU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be good to point out that the split in nova-core and nova-drm allows us to run a much smaller (and hence a potentially less error prone) driver on the host side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that relevant? The amount of software running is the same, or, actually, even bigger with vGPU because you have nova-core N times.

I suppose you think it's good because it's a smaller broad side exposed for security stuff and the like, but am not sure.

As with every real open source program, help and participation is highly welcome!
Of course, it is also possible to use Nova-Core + Nova-DRM on one physical
machine, then directly using the GPU through Mesa in the host's userspace.

As the driver is very young, however, it is currently difficult to assign tasks
to people. Many things still have to settle until a steadily paced workflow
produces atomic work topics a new one can work on.
For more details about vGPUs, take a look at
[Zhi's announcement email](https://lore.kernel.org/nouveau/20240922124951.1946072-1-zhiw@nvidia.com/).

If you really want to jump in immediately regardless, here are a few things you
can consider:

- Most work to do right now is with more bindings for Rust. Notably, this
includes the device driver model, DRM and PCI. If you have expertise there,
have a look at the existing code in the [topic branches](Branches.md) and see
if there's something you can add or improve.
- Feel free to go over Nova's code base and make suggestions or send patches,
for example for improved comments, grammar fixes, improving code readability
etc.
## Status and Contributing

The necessary Rust infrastructure has been progressing a lot. Current work now
focuses more on the actual driver. In case you want to contribute, take a look
at the
[NOVA TODO List](https://docs.kernel.org/gpu/nova/core/todo.html).

Happy hacking!
Don't hesitate reaching out on the aforementioned community channels.
Binary file added src/nova-core-vm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
0