You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/gen-ai/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,6 @@ This folder contains the design doc for GenAI Model package
11
11
-[Tokenizer](./Tokenizer.md)
12
12
13
13
### Need further investigation
14
-
-[Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](../DynamicLoadingReport.md)
14
+
-[Dynamic loading](./DynamicLoading.md): load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in [this report](./DynamicLoadingReport.md)
15
15
- Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed
16
-
- Quantization: quantize the model to reduce the model size and improve the inference speed
16
+
- Quantization: quantize the model to reduce the model size and improve the inference speed
0 commit comments