F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems
Creators
Description
F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv
.
The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data.
F-DATA is composed of 38 files, with each YY_MM.parquet
file containing the data of the jobs submitted in the month MM of the year YY.
.parquet
files. It is possible to load such files as dataframes by leveraging the pandas
APIs, after installing pyarrow
(pip install pyarrow
). A single file can be read with the following Python
instrcutions:# Importing pandas library
import pandas as pd
# Read the 21_01.parquet file in a dataframe format
df = pd.read_parquet("21_01.parquet")
df.head()
Files
feature_list.csv
Files
(27.2 GB)
Name | Size | Download all |
---|---|---|
md5:012dd8d987585d898e5083416e563988
|
458.0 MB | Download |
md5:3178b7481f639511a74b2be85f5079ba
|
150.5 MB | Download |
md5:41287430b7e8f7fa6e3e19275b897d1d
|
433.3 MB | Download |
md5:d057f2cea79f6442c7a628ffb81bb647
|
505.8 MB | Download |
md5:3bebee043638772a88dc915746069d2c
|
282.8 MB | Download |
md5:3e1a8b3cf0fdd20bac8cf68a296ccb83
|
336.4 MB | Download |
md5:47d05827003c3d25a24353dc1561e90f
|
331.3 MB | Download |
md5:4e9d8ae90fe577ab17158419431e6a92
|
838.7 MB | Download |
md5:0b89e404f21d719febd23b04c2ad881b
|
699.4 MB | Download |
md5:fca13b423e5c38139f92deab4f69c8e7
|
931.0 MB | Download |
md5:dd8de06f73d2c0e3fd8919342b4018ae
|
1.3 GB | Download |
md5:0cc8dc1b070ae5decde22eea5a9a7bcd
|
817.8 MB | Download |
md5:9b65c2a6f7ebe98498da585d83ce517c
|
746.0 MB | Download |
md5:d56caa57f21d4d3b2602bf321f362419
|
629.1 MB | Download |
md5:9ca6e67fdee336a856f62e3587736e1a
|
536.7 MB | Download |
md5:648f177badf349946e4a173b6ce97c21
|
586.7 MB | Download |
md5:7ca28a1e147b375407c4a10ae6bdb38f
|
575.9 MB | Download |
md5:6b264abaafd9018553a4db2b808e684f
|
634.0 MB | Download |
md5:6098b25b014424314fdd7c67986e3c47
|
841.3 MB | Download |
md5:510c146e33c81760b48d39d1e71d02f6
|
663.5 MB | Download |
md5:9d5b8c3d8e24d8dc1bb2f20122fdeb68
|
795.3 MB | Download |
md5:2f2be33534bdbdca59cbd1cee3094097
|
855.7 MB | Download |
md5:681603aba504a37da6c19d0cc173980a
|
981.9 MB | Download |
md5:4bd51ee8439d918fc64c5747dc093cdb
|
762.1 MB | Download |
md5:17be3481816aa263d7867fa97ffd9828
|
728.2 MB | Download |
md5:261dce6e2bdba46820186229d727b600
|
575.4 MB | Download |
md5:194e3078ed873869a992c788ad94ce78
|
635.2 MB | Download |
md5:612ce0739c46481fb4d005f533b4e3a5
|
1.3 GB | Download |
md5:9cd9e0aa7ebc418e0e5c8c2d977fe969
|
968.2 MB | Download |
md5:974e643024e0811c24354bafee5c988e
|
1.2 GB | Download |
md5:edf1560ec5b7cf81dd18662edf89d33e
|
455.7 MB | Download |
md5:eb1d96b17237f15f29371ddf40ad9e38
|
885.6 MB | Download |
md5:6e4e59472c186a73626d551620c206db
|
944.8 MB | Download |
md5:2eb439357d2f23e5dafa9299005470d7
|
909.9 MB | Download |
md5:337879f82a77a75c781a8b19a9bdbeee
|
1.1 GB | Download |
md5:ee9fa11d2341a8d92ce9ebdfa2b54220
|
800.0 MB | Download |
md5:e722928391ffa01c691b24499d0cb99a
|
529.4 MB | Download |
md5:ce7bd7f0d06daf9d7a93844850f4d172
|
568.1 MB | Download |
md5:1571c432d6c579acef637dce6b8db796
|
2.2 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/francescoantici/F-DATA