8000 Update base for Update on "[inductor][cpp] bf16/fp16 gemm template co… · pytorch/pytorch@160d8d2 · GitHub
[go: up one dir, main page]

Skip to content

Commit 160d8d2

Browse files
author
Jiong Gong
committed
Update base for Update on "[inductor][cpp] bf16/fp16 gemm template computed with fp32 w/o epilogue fusion"
As part of #125683, this PR adds the initial bf16/fp16 gemm template support with micro-gemm implemented with fused type casting and fp32 computation. It doesn't provide epilogue fusion support yet which will be added in the next PR. cc voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang Differential Revision: [D58017580](https://our.internmc.facebook.com/intern/diff/D58017580) [ghstack-poisoned]
2 parents e72d25c + 4644def commit 160d8d2

File tree

2 files changed

+18
-4
lines changed

2 files changed

+18
-4
lines changed

torch/csrc/profiler/kineto_shim.cpp

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ const std::set<libkineto::ActivityType> kXpuTypes = {
4747
const std::set<libkineto::ActivityType> kMtiaTypes = {
4848
libkineto::ActivityType::MTIA_CCP_EVENTS,
4949
libkineto::ActivityType::MTIA_RUNTIME,
50+
libkineto::ActivityType::MTIA_WORKLOADD,
5051
};
5152
const std::set<libkineto::ActivityType> kPrivateUse1Types = {
5253
libkineto::ActivityType::GPU_MEMCPY,
@@ -344,9 +345,7 @@ c10::DeviceType deviceTypeFromActivity(libkineto::ActivityType activity_type) {
344345
case libkineto::ActivityType::CONCURRENT_KERNEL:
345346
case libkineto::ActivityType::CUDA_SYNC:
346347
case libkineto::ActivityType::GPU_USER_ANNOTATION:
347-
case libkineto::ActivityType::CUDA_PROFILER_RANGE:
348-
// TODO: T151322015
349-
case libkineto::ActivityType::MTIA_CCP_EVENTS: {
348+
case libkineto::ActivityType::CUDA_PROFILER_RANGE: {
350349
// PrivateUse1 kineto backend reuse above ActivityTypes,
351350
// If PrivateUse1 backend enabled, this should return
352351
// c10::DeviceType::PrivateUse1.
@@ -358,6 +357,20 @@ c10::DeviceType deviceTypeFromActivity(libkineto::ActivityType activity_type) {
358357
}();
359358
return device_type;
360359
}
360+
// TODO: T151322015
361+
case libkineto::ActivityType::MTIA_CCP_EVENTS:
362+
case libkineto::ActivityType::MTIA_WORKLOADD: {
363+
// PrivateUse1 kineto backend reuse above ActivityTypes,
364+
// If PrivateUse1 backend enabled, this should return
365+
// c10::DeviceType::PrivateUse1.
366+
c10::DeviceType device_type = []() {
367+
if (c10::get_privateuse1_backend() != "privateuseone") {
368+
return c10::DeviceType::PrivateUse1;
369+
}
370+
return c10::DeviceType::MTIA;
371+
}();
372+
return device_type;
373+
}
361374
case libkineto::ActivityType::CPU_OP:
362375
case libkineto::ActivityType::USER_ANNOTATION:
363376
case libkineto::ActivityType::EXTERNAL_CORRELATION:

torch/serialization.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -921,7 +921,8 @@ def load(
921921
pickle_module: module used for unpickling metadata and objects (has to
922922
match the :attr:`pickle_module` used to serialize file)
923923
weights_only: Indicates whether unpickler should be restricted to
924-
loading only tensors, primitive types and dictionaries
924+
loading only tensors, tensor subclasses, primitive types, dictionaries
925+
and any types added via :func:`torch.serialization.add_safe_globals`.
925926
mmap: Indicates whether the file should be mmaped rather than loading all the storages into memory.
926927
Typically, tensor storages in the file will first be moved from disk to CPU memory, after which they
927928
are moved to the location that they were tagged with when saving, or specified by ``map_location``. This

0 commit comments

Comments
 (0)
0