[go: up one dir, main page]

Published June 5, 2024 | Version 1.0
Dataset Open

F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems

  • 1. ROR icon University of Bologna
  • 2. ROR icon RIKEN Center for Computational Science

Description

F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv.

The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data.

F-DATA is composed of 38 files, with each YY_MM.parquet file containing the data of the jobs submitted in the month MM of the year YY. 

The files of F-DATA are saved as .parquet files. It is possible to load such files as dataframes by leveraging the pandas APIs, after installing pyarrow (pip install pyarrow). A single file can be read with the following Python instrcutions:

# Importing pandas library
import pandas as pd
 
# Read the 21_01.parquet file in a dataframe format
df = pd.read_parquet("21_01.parquet")
df.head()
 

Files

feature_list.csv

Files (27.2 GB)

Name Size Download all
md5:012dd8d987585d898e5083416e563988
458.0 MB Download
md5:3178b7481f639511a74b2be85f5079ba
150.5 MB Download
md5:41287430b7e8f7fa6e3e19275b897d1d
433.3 MB Download
md5:d057f2cea79f6442c7a628ffb81bb647
505.8 MB Download
md5:3bebee043638772a88dc915746069d2c
282.8 MB Download
md5:3e1a8b3cf0fdd20bac8cf68a296ccb83
336.4 MB Download
md5:47d05827003c3d25a24353dc1561e90f
331.3 MB Download
md5:4e9d8ae90fe577ab17158419431e6a92
838.7 MB Download
md5:0b89e404f21d719febd23b04c2ad881b
699.4 MB Download
md5:fca13b423e5c38139f92deab4f69c8e7
931.0 MB Download
md5:dd8de06f73d2c0e3fd8919342b4018ae
1.3 GB Download
md5:0cc8dc1b070ae5decde22eea5a9a7bcd
817.8 MB Download
md5:9b65c2a6f7ebe98498da585d83ce517c
746.0 MB Download
md5:d56caa57f21d4d3b2602bf321f362419
629.1 MB Download
md5:9ca6e67fdee336a856f62e3587736e1a
536.7 MB Download
md5:648f177badf349946e4a173b6ce97c21
586.7 MB Download
md5:7ca28a1e147b375407c4a10ae6bdb38f
575.9 MB Download
md5:6b264abaafd9018553a4db2b808e684f
634.0 MB Download
md5:6098b25b014424314fdd7c67986e3c47
841.3 MB Download
md5:510c146e33c81760b48d39d1e71d02f6
663.5 MB Download
md5:9d5b8c3d8e24d8dc1bb2f20122fdeb68
795.3 MB Download
md5:2f2be33534bdbdca59cbd1cee3094097
855.7 MB Download
md5:681603aba504a37da6c19d0cc173980a
981.9 MB Download
md5:4bd51ee8439d918fc64c5747dc093cdb
762.1 MB Download
md5:17be3481816aa263d7867fa97ffd9828
728.2 MB Download
md5:261dce6e2bdba46820186229d727b600
575.4 MB Download
md5:194e3078ed873869a992c788ad94ce78
635.2 MB Download
md5:612ce0739c46481fb4d005f533b4e3a5
1.3 GB Download
md5:9cd9e0aa7ebc418e0e5c8c2d977fe969
968.2 MB Download
md5:974e643024e0811c24354bafee5c988e
1.2 GB Download
md5:edf1560ec5b7cf81dd18662edf89d33e
455.7 MB Download
md5:eb1d96b17237f15f29371ddf40ad9e38
885.6 MB Download
md5:6e4e59472c186a73626d551620c206db
944.8 MB Download
md5:2eb439357d2f23e5dafa9299005470d7
909.9 MB Download
md5:337879f82a77a75c781a8b19a9bdbeee
1.1 GB Download
md5:ee9fa11d2341a8d92ce9ebdfa2b54220
800.0 MB Download
md5:e722928391ffa01c691b24499d0cb99a
529.4 MB Download
md5:ce7bd7f0d06daf9d7a93844850f4d172
568.1 MB Download
md5:1571c432d6c579acef637dce6b8db796
2.2 kB Preview Download

Additional details