8000 Update dynamic loading report reference (#7321) · dotnet/machinelearning@5d0dafb · GitHub
[go: up one dir, main page]

Skip to content

Commit 5d0dafb

Browse files
Update dynamic loading report reference (#7321)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
1 parent cfc1fb5 commit 5d0dafb

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/gen-ai/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ This folder contains the design doc for GenAI Model package
1111
- [Tokenizer](./Tokenizer.md)
1212

1313
### Need further investigation
14-
- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](../DynamicLoadingReport.md)
14+
- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](./DynamicLoadingReport.md)
1515
- Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed
16-
- Quantization: quantize the model to reduce the model size and improve the inference speed
16+
- Quantization: quantize the model to reduce the model size and improve the inference speed

0 commit comments

Comments
 (0)
0