-
Notifications
You must be signed in to change notification settings - Fork 21
Nova docu update #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Nova docu update #45
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,62 +1,106 @@ | ||
# Nova GPU Driver | ||
|
||
Nova is a driver for GSP-based Nvidia GPUs that is currently under development | ||
and is being written in Rust. | ||
Nova is a driver for GSP (GPU system processor) based Nvidia GPUs. It is | ||
intended to become the successor of Nouveau as the mainline driver for Nvidia | ||
(GSP) GPUs in Linux. | ||
|
||
Currently, the objective is to upstream Rust abstractions for the relevant | ||
subsystems a prerequisite for the actual driver. Hence, the first mainline | ||
version of Nova will be a stub driver which helps establishing the necessary | ||
infrastructure in other subsystems (notably PCI and DRM). | ||
It will support all Nvidia GPUs beginning with the GeForce RTX20 (Turing family) | ||
series and newer. | ||
|
||
## Contact | ||
|
||
To contact the team and / or participate in development, please use the mailing | ||
list: nouveau@lists.freedesktop.org | ||
Available communication channels are: | ||
|
||
- The mailing list: nouveau@lists.freedesktop.org | ||
- IRC: #nouveau on OFTC | ||
- [Zulip Chat](https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova) | ||
|
||
|
||
## Resources | ||
|
||
- [Official Source Tree](https://gitlab.freedesktop.org/drm/nova) | ||
- [Announcement E-Mail](https://lore.kernel.org/dri-devel/Zfsj0_tb-0-tNrJy@cassiopeiae/) | ||
The parts that are already in mainline Linux can be found in | ||
`drivers/gpu/nova-core/` and `drivers/gpu/drm/nova/` | ||
|
||
The development repository for the in-tree driver is located on | ||
[Freedesktop](https://gitlab.freedesktop.org/drm/nova). | ||
|
||
|
||
## Background | ||
|
||
### Why a new driver? | ||
|
||
Nouveau was, for the most part, designed for pre-GSP hardware. The driver exists | ||
since ~2009 and its authors back in the day had to reverse engineer a lot about | ||
the hardware's internals, resulting in a relatively difficult to maintain | ||
codebase. | ||
|
||
Moreover, Nouveau's maintainers concluded that a new driver, exclusively for | ||
GSP hardware, would allow for significantly simplifying the driver design: Most | ||
of the hardware internals that Nouveau had to reverse engineer reside in the | ||
GSP firmware. Hereby, the GSP takes up the role of a hardware abstraction layer | ||
which communicates with the host kernel through IPC. Thereby, a lot of the | ||
stack's complexity is moved from the GPU driver into the GSP firmware. | ||
|
||
This, in consequence, enables better maintainability. Another chance with a new | ||
driver is to obtain active community participation from the very beginning. | ||
|
||
|
||
In the source tree, the driver lives in `drivers/gpu/drm/nova`. | ||
### Why write it in Rust? | ||
|
||
Rust's most desired feature are its guarantees for memory safety, notably the | ||
elimination of Use-after-Free errors. Those are errors GPU drivers suffer from | ||
significantly, because GPUs are, by definition, asynchronous in regards to the | ||
CPU and can handle a great many jobs (i.e., memory objects) simultaneously. | ||
Jobs' status can be changed at different places in the code base at different | ||
points in time (through work items, interrupt handlers, userspace calls, ...). | ||
|
||
## Status | ||
In short, GPU drivers were expected to profit the most from the promised memory | ||
safety. | ||
|
||
Currently, Nova is just a stub driver intended to lift the bindings necessary | ||
for a real GPU driver into the (mainline) kernel. | ||
Since Nova is a freshly written new driver, it was an opportunity to try to | ||
leverage the advantages of Rust and obtain a more reliable, maintainable driver. | ||
|
||
Currently, those efforts are mostly focused on getting bindings for PCI, DRM | ||
and the Device (driver) model upstream. | ||
Besides Rust's built-in ownership and lifetime model, its powerful type system | ||
allows us to avoid a large portion of a whole class of bugs (i.e. memory safety | ||
bugs). | ||
|
||
It can be expected that, as the driver continues to grow, various other abstractions | ||
will be needed. | ||
Additionally, the same features allow us to model APIs in a way that also | ||
certain logic errors can be caught at compile time already. | ||
|
||
## Architecture | ||
|
||
## Utilized Common Rust Infrastructure | ||
 | ||
dasteihn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Nova depends on the Rust for Linux `staging/*` [branches](Branches.md). | ||
The overall GPU driver is split into two parts: | ||
|
||
1. "Nova-Core", living in `drivers/gpu/nova-core/`. Nova-Core implements | ||
the fundamental interaction with the hardware (through PCI etc.) and, | ||
notably, boots up the GSP and interacts with it through a command queue. | ||
2. "Nova-DRM" (the official name is actually just "Nova", but to avoid | ||
confusion developers usually call it "Nova-DRM"), living in | ||
`drivers/gpu/drm/nova/`. This is the actual graphics driver, | ||
implementing all the typical DRM interfaces for userspace. | ||
|
||
## Contributing | ||
This split architecture allows for virtualizing GPUs: Nova-Core can be used to | ||
instruct the GPU's firmware to spawn a new PCI virtual function (Through | ||
[SR-IOV](https://docs.kernel.org/PCI/pci-iov-howto.html)), thus | ||
creating new PCI virtual functions), which can then be passed to a virtual | ||
machine, which then, for example, can run another Linux with another Nova-Core | ||
bound to the virtual GPU. Then, on top, Nova-DRM can be utilized as a | ||
conventional GPU driver to use the vGPU. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would also be good to point out that the split in nova-core and nova-drm allows us to run a much smaller (and hence a potentially less error prone) driver on the host side. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is that relevant? The amount of software running is the same, or, actually, even bigger with vGPU because you have nova-core N times. I suppose you think it's good because it's a smaller broad side exposed for security stuff and the like, but am not sure. |
||
As with every real open source program, help and participation is highly welcome! | ||
Of course, it is also possible to use Nova-Core + Nova-DRM on one physical | ||
machine, then directly using the GPU through Mesa in the host's userspace. | ||
|
||
As the driver is very young, however, it is currently difficult to assign tasks | ||
to people. Many things still have to settle until a steadily paced workflow | ||
produces atomic work topics a new one can work on. | ||
For more details about vGPUs, take a look at | ||
[Zhi's announcement email](https://lore.kernel.org/nouveau/20240922124951.1946072-1-zhiw@nvidia.com/). | ||
|
||
If you really want to jump in immediately regardless, here are a few things you | ||
can consider: | ||
|
||
- Most work to do right now is with more bindings for Rust. Notably, this | ||
includes the device driver model, DRM and PCI. If you have expertise there, | ||
have a look at the existing code in the [topic branches](Branches.md) and see | ||
if there's something you can add or improve. | ||
- Feel free to go over Nova's code base and make suggestions or send patches, | ||
for example for improved comments, grammar fixes, improving code readability | ||
etc. | ||
## Status and Contributing | ||
|
||
The necessary Rust infrastructure has been progressing a lot. Current work now | ||
focuses more on the actual driver. In case you want to contribute, take a look | ||
at the | ||
[NOVA TODO List](https://docs.kernel.org/gpu/nova/core/todo.html). | ||
|
||
Happy hacking! | ||
Don't hesitate reaching out on the aforementioned community channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically included in the class of problems you describe, but I would also mention that Rust's powerful type system allows us to encode a lot of logic that subsequently can be evaluated at compile time rather than runtime.
A prominent example are lifetime rules, which can be greatly evaluated at compile time, where DRM drivers in C have to enforce them by convention, which given the high complexity of DRM drivers, often leads to (memory) bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we have to list all of Rust's features here, do we.
As I'm not sure what precisely you're talking about here when you're referencing the complicated ("powerful") type system, I'd ask you to provide a sentence that you see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to update this section, can you please put what I came up with at the beginning of "### Why write it in Rust?"?