default search action
37th IPDPS 2023: St. Petersburg, FL, USA
- IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, St. Petersburg, FL, USA, May 15-19, 2023. IEEE 2023, ISBN 979-8-3503-3766-2
- Junhyeok Jang, Miryeong Kwon, Donghyun Gouk, Hanyeoreum Bae, Myoungsoo Jung:
GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets. 2-12 - Haiheng He, Dan Chen, Long Zheng, Yu Huang, Haifeng Liu, Chaoqiang Liu, Xiaofei Liao, Hai Jin:
GraphMetaP: Efficient MetaPath Generation for Dynamic Heterogeneous Graph Models. 13-24 - Prasun Gera, Hyesoon Kim:
Traversing Large Compressed Graphs on GPUs. 25-35 - Isuru Ranawaka, Md. Khaledur Rahman, Ariful Azad:
Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs. 36-46 - Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr.:
Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots. 47-57 - Sudipta Saha Shubha, Shohaib Mahmud, Haiying Shen, Geoffrey C. Fox, Madhav V. Marathe:
Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility Graph. 58-68 - Zeyu Luan, Qing Li, Yi Wang, Yong Jiang:
H-Cache: Traffic-Aware Hybrid Rule-Caching in Software-Defined Networks. 69-78 - Jiaxin Lei, Manish Munikar, Hui Lu, Jia Rao:
Accelerating Packet Processing in Container Overlay Networks via Packet-level Parallelism. 79-89 - Haodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang, Hai Jin:
Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores. 90-101 - Mohamed W. Hassan, Adel Dabah, Hatem Ltaief, Suhaib A. Fahmy:
Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAs. 102-111 - Ajay Singh, Trevor Brown, Michael Spear:
Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data Structures. 112-122 - Kaushik Kandadi Suresh, Benjamin Michalowicz, Bharath Ramesh, Nicholas Contini, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. 123-133 - Qinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication. 134-144 - Sonia Rani Gupta, Nikela Papadopoulou, Miquel Pericàs:
Accelerating CNN inference on long vector architectures via co-design. 145-155 - Jianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, Fengwei Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, Depei Qian:
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU. 156-166 - Zheng Zhang, Donglin Yang, Yaqi Xia, Liang Ding, Dacheng Tao, Xiaobo Zhou, Dazhao Cheng:
MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism. 167-177 - Hariharan Devarajan, Kathryn M. Mohror:
Mimir: Extending I/O Interfaces to Express User Intent for Complex Workloads in HPC. 178-188 - Di Zhang, Chris Egersdoerfer, Tabassum Mahmud, Mai Zheng, Dong Dai:
Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code Analysis. 189-199 - Saisha Kamat, Abdullah Al Raqibul Islam, Mai Zheng, Dong Dai:
FaultyRank: A Graph-based Parallel File System Checker. 200-210 - John Ravi, Suren Byna, Quincey Koziol, Houjun Tang, Michela Becchi:
Evaluating Asynchronous Parallel I/O on HPC Systems. 211-221 - Qifan Xu, Yang You:
An Efficient 2D Method for Training Super-Large Deep Learning Models. 222-232 - Bingyi Zhang, Viktor K. Prasanna:
Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation. 233-244 - Siddharth Singh, Abhinav Bhatele:
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training. 245-255 - Daning Cheng, Shigang Li, Yunquan Zhang:
Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner. 256-267 - Danlin Jia, Yiming Xie, Li Wang, Xiaoqian Zhang, Allen Yang, Xuebin Yao, Mahsa Bayati, Pradeep Subedi, Bo Sheng, Ningfang Mi:
SRC: Mitigate I/O Throughput Degradation in Network Congestion Control of Disaggregated Storage Systems. 268-278 - Qi Yu, Lin Wang, Yuchong Hu, Yumeng Xu, Dan Feng, Jie Fu, Xia Zhu, Zhen Yao, Wenjia Wei:
Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure Coding. 279-289 - Michael J. Brim, Adam T. Moody, Seung-Hwan Lim, Ross G. Miller, Swen Boehm, Cameron Stanavige, Kathryn M. Mohror, Sarp Oral:
UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage. 290-300 - Kyu-Jin Cho, Injae Kang, Jin-Soo Kim:
ArkFS: A Distributed File System on Object Storage for Archiving Data in HPC Environment. 301-311 - Rory Hector, Ramachandran Vaidyanathan, Gokarna Sharma, Jerry L. Trahan:
On Doorway Egress by Autonomous Robots. 312-321 - Wissam M. Sid-Lakhdar, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Piotr Luszczek, Mark Gates, Stanimire Tomov, Hans Johansen, David B. Williams-Young, Timothy A. Davis, Jack J. Dongarra, Hartwig Anzt:
PAQR: Pivoting Avoiding QR factorization. 322-332 - Junqi Yin, Feiyi Wang, Mallikarjun Arjun Shankar:
DeepThermo: Deep Learning Accelerated Parallel Monte Carlo Sampling for Thermodynamics Evaluation of High Entropy Alloys. 333-343 - Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu:
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. 344-355 - Emmanuel Agullo, Alfredo Buttari, Olivier Coulaud, Lionel Eyraud-Dubois, Mathieu Faverge, Alain Franc, Abdou Guermouche, Antoine Jego, Romain Peressoni, Florent Pruvost:
On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM). 357-367 - João Nuno Ferreira Alves, Luís M. S. Russo, Alexandre P. Francisco, Siegfried Benkner:
A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices. 368-378 - Yichen Zhang, Shengguo Li, Fan Yuan, Dezun Dong, Xiaojian Yang, Tiejun Li, Zheng Wang:
Memory-aware Optimization for Sequences of Sparse Matrix-Vector Multiplications. 379-389 - Olivier Beaumont, Jean-Alexandre Collin, Lionel Eyraud-Dubois, Mathieu Vérité:
Data Distribution Schemes for Dense Linear Algebra Factorizations on Any Number of Nodes. 390-401 - Yongseok Soh, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi:
Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams. 402-412 - Max A. Deppert, Klaus Jansen, Marten Maack, Simon Pukrop, Malin Rau:
Scheduling with Many Shared Resources. 413-423 - Laurent Schares, Asser N. Tantawi, Pavlos Maniotis, Ming-Hung Chen, Claudia Misale, Seetharami Seelam, Hao Yu:
Chic-sched: a HPC Placement-Group Scheduler on Hierarchical Topologies with Constraints. 424-434 - Lanshun Nie, Yuqi Qiu, Fei Meng, Mo Yu, Jing Li:
Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs. 435-445 - Bo Wang, Anara Kozhokanova, Christian Terboven, Matthias S. Müller:
RLP: Power Management Based on a Latency-Aware Roofline Model. 446-456 - Ke Liu, Kan Wu, Hua Wang, Ke Zhou, Ji Zhang, Cong Li:
SLAP: An Adaptive, Learned Admission Policy for Content Delivery Network Caching. 457-467 - Zahra Najafabadi Samani, Narges Mehran, Dragi Kimovski, Radu Prodan:
Proactive SLA-aware Application Placement in the Computing Continuum. 468-479 - Chuyao Ye, Hao Zheng, Zhigang Hu, Meiguang Zheng:
PFedSA: Personalized Federated Multi-Task Learning via Similarity Awareness. 480-488 - Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang:
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout. 489-500 - Ruibo Fan, Wei Wang, Xiaowen Chu:
Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks. 501-511 - Süreyya Emre Kurt, Jinghua Yan, Aravind Sukumaran-Rajam, Prashant Pandey, P. Sadayappan:
Communication Optimization for Distributed Execution of Graph Neural Networks. 512-523 - Yufan Xia, Marco De La Pierre, Amanda S. Barnard, Giuseppe M. J. Barca:
A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication. 524-534 - Akash Dutta, Jee Choi, Ali Jannesari:
Power Constrained Autotuning using Graph Neural Networks. 535-545 - Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, Sayed Ahmad Salehi, Jeffrey Todd Hastings:
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs. 546-556 - Yi-Chien Lin, Viktor K. Prasanna:
HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture. 557-567 - William Ladd, Christopher Jensen, Madhurima Vardhan, Jeff Ames, Jeff R. Hammond, Erik W. Draeger, Amanda Randles:
Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation. 568-578 - Archie Powell, Gihan R. Mudalige:
Predictive Analysis of Code Optimisations on Large-Scale Coupled CFD-Combustion Simulations using the CPX Mini-App. 579-589 - Kumar Saurabh, Masado Ishii, Makrand A. Khanwale, Hari Sundar, Baskar Ganapathysubramanian:
Scalable adaptive algorithms for next-generation multiphase flow simulations. 590-601 - Joshua Hoke Davis, Justin Shafner, Daniel Nichols, Nathan Grube, Pino Martin, Abhinav Bhatele:
Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms. 602-612 - Ignacio Gavier, Joshua Russell, Devdhar Patel, Edward A. Rietman, Hava T. Siegelmann:
Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits. 613-623 - Olivia Grimes, Jacob Nelson-Slivon, Ahmed Hassan, Roberto Palmieri:
Opportunities and Limitations of Hardware Timestamps in Concurrent Data Structures. 624-634 - Younghyun Cho, James Weldon Demmel, Jacob King, Xiaoye S. Li, Yang Liu, Hengrui Luo:
Harnessing the Crowd for Autotuning High-Performance Computing Applications. 635-645 - Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc*. 646-656 - Yi Zhao, Juepeng Zheng, Haohuan Fu, Wenzhao Wu, Jie Gao, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Runmin Dong, Zhenrong Du, Sha Liu, Xin Liu, Shaoqing Zhang, Le Yu:
SW-LCM: A Scalable and Weakly-supervised Land Cover Mapping Method on a New Sunway Supercomputer. 657-667 - Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, Nikela Papadopoulou, Nectarios Koziris, Georgios I. Goumas:
Feature-based SpMV Performance Analysis on Contemporary Devices. 668-679 - Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam:
An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. 680-689 - Peter Sanders, Matthias Schimek:
Engineering Massively Parallel MST Algorithms. 691-701 - Peter Sanders, Tim Niklas Uhl:
Engineering a Distributed-Memory Triangle Counting Algorithm. 702-712 - Jue Wang, Fumihiko Ino, Jing Ke:
PRF: A Fast Parallel Relaxed Flooding Algorithm for Voronoi Diagram Generation on GPU. 713-723 - Christian Hellwig, Fabian Czappa, Martin Michel, Reinhold Bertrand, Felix Wolf:
Satellite Collision Detection using Spatial Data Structures. 724-735 - Michael Kenzel, Stefan Lemme, Richard Membarth, Matthias Kurtenacker, Hugo Devillers, Markus Steinberger, Philipp Slusallek:
AnyQ: An Evaluation Framework for Massively-Parallel Queue Algorithms. 736-745 - Tsung-Wei Huang:
qTask: Task-parallel Quantum Circuit Simulation with Incrementality. 746-756 - Milan Shah, Xiaodong Yu, Sheng Di, Danylo Lykov, Yuri Alexeev, Michela Becchi, Franck Cappello:
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations. 757-767 - Fei Li, Arul Rhik Mazumder:
An Adaptive Hybrid Quantum Algorithm for the Metric Traveling Salesman Problem. 768-778 - Bradley H. Theilman, Yipu Wang, Ojas Parekh, William Severa, J. Darby Smith, James B. Aimone:
Stochastic Neuromorphic Circuits for Solving MAXCUT. 779-787 - Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan:
TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters. 788-797 - Guang Fan, Fangyu Zheng, Lipeng Wan, Lili Gao, Yuan Zhao, Jiankuo Dong, Yixuan Song, Yuewu Wang, Jingqiang Lin:
Towards Faster Fully Homomorphic Encryption Implementation with Integer and Floating-point Computing Power of GPUs. 798-808 - Xujing Li, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang:
FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization. 809-819 - Pierre-François Dutot, Yeu-Shin Fu, Nikhil Prasad, Oliver Sinnen:
A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication Delay. 820-830 - Yifeng Tang, Cho-Li Wang:
SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors. 831-841 - Zhangchen Xu, Yuetai Li, Chenglin Feng, Lei Zhang:
Exact Fault-Tolerant Consensus with Voting Validity. 842-852 - Mark de Berg, Leyla Biabani, Morteza Monemizadeh:
k-Center Clustering with Outliers in the MPC and Streaming Model. 853-863 - Lu Zhang, Chao Li, Xinkai Wang, Weiqi Feng, Zheng Yu, Quan Chen, Jingwen Leng, Minyi Guo, Pu Yang, Shang Yue:
FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing. 864-874 - Zhuo Huang, Hao Fan, Chaoyi Cheng, Song Wu, Hai Jin:
Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read Data. 875-885 - Hao Wu, Junxiao Deng, Hao Fan, Shadi Ibrahim, Song Wu, Hai Jin:
QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows. 886-896 - Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler:
rFaaS: Enabling High Performance Serverless with RDMA and Leases. 897-907 - Tianyao Shi, Yingxuan Yang, Yunlong Cheng, Xiaofeng Gao, Zhen Fang, Yongqiang Yang:
Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud. 908-917 - Ming Zhao, Kritshekhar Jha, Sungho Hong:
GPU-enabled Function-as-a-Service for Machine Learning Inference. 918-928 - Pouriya Zarbafian, Vincent Gramoli:
Lyra: Fast and Scalable Resilience to Reordering Attacks in Blockchains. 929-939 - Deepal Tennakoon, Yiding Hua, Vincent Gramoli:
Smart Redbelly Blockchain: Reducing Congestion for Web3. 940-950 - Weicong Chen, Hao Qi, Xiaoyi Lu, Curtis Tatsuoka:
SBGT: Scaling Bayesian-based Group Testing for Disease Surveillance. 951-962 - Vani Nagarajan, Milind Kulkarni:
RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware. 963-973 - Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, Ramu Anandakrishnan, Feiyi Wang:
Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations. 974-984 - Tom Peterka, Dmitriy Morozov, Arnur Nigmetov, Orcun Yildiz, Bogdan Nicolae, Philip E. Davis:
LowFive: In Situ Data Transport for High-Performance Workflows. 985-995 - Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. 996-1006 - Shaomeng Li, Peter Lindstrom, John P. Clyne:
Lossy Scientific Data Compression With SPERR. 1007-1017 - Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev:
Fast And Automatic Floating Point Error Analysis With CHEF-FP. 1018-1028 - Nicolau Manubens, Tiago Quintino, Simon D. Smart, Emanuele Danovaro, Adrian Jackson:
DAOS as HPC Storage: a View From Numerical Weather Prediction. 1029-1040 - Bing Lu, Yida Li, Junqi Wang, Huizhang Luo, Kenli Li:
ZFP-X: Efficient Embedded Coding for Accelerating Lossy Floating Point Compression. 1041-1050
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.