Update dynamic loading report reference (#7321)

emmanuel-ferdman · web-flow · commit 5d0dafbf9c4c · 2024-11-20T10:29:19.000-07:00
Signed-off-by: Emmanuel Ferdman &lt;emmanuelferdman@gmail.com&gt;
diff --git a/docs/gen-ai/README.md b/docs/gen-ai/README.md
@@ -11,6 +11,6 @@ This folder contains the design doc for GenAI Model package
 - [Tokenizer](./Tokenizer.md)
 
 ### Need further investigation
-- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](../DynamicLoadingReport.md)
+- [Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](./DynamicLoadingReport.md)
 - Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed
-- Quantization: quantize the model to reduce the model size and improve the inference speed
+- Quantization: quantize the model to reduce the model size and improve the inference speed