Tags: Par4All/par4all
Tags
Par4All 1.3.1 From the press release : HPC Project introduces the 1.3.1 version of Par4All open parallelization platform with automatic code generation of OpenCL, the open standard for parallel programming of heterogeneous systems. This version brings also significant additions in the automatic generation of CUDA code resulting in better execution time. The changelog : * Check for CUDA memory allocation errors * Restore PIPS properties after a PyPS exception * Manage better pointers on 1D array on GPU * Fix loop bound generation in the presence of unsigned index types * Fix p4a --report * Fix regressions in examples because of PIPS validation compatibility. * Stars-PM example can work in OpenCL and OpenGL * New --no-pointer-aliasing option in p4a * --fine option is now --fine-grain. Coarse grain is again the default parallelization. * Remove parasitic kernel launchers from GPU code with static functions * Put libraries at the end of link options since it chokes some linkers. * Fix bugs in loop fusion * Fix bugs in PIPS linear around overflow handling * Improve array linearization to deal with pointers on array of structs, VLA, arrays with static size and others, skip local arrays... * Fix bugs in statement isolation, accept structs, deal better with partial arrays... * Fix bugs in outliner * Fix bugs in GPU-ify * Describe better the versions in scripts and validation * Update package list for recent Ubuntu * Update installation guide * p4a_scpp is now more user friendly * Better linear constraints, improve normalization * Improve redundancy reduction in Linear * Improve ipyps portability * Improve OpenMP reduction pragma when several reductions at the same time * Fix bug in localize_declaration * Improve simplify control * Improve p4a_git integration script * Improve organization of the Par4All download server directories * Fix typo in stub broker * Fix P4A Accel runtime to deal with constants in OpenCL kernel invocation * Differentiate more OpenMP and GPU compilation flows * Restructure Par4All Accel Runtime * Avoid launching kernels with no iteration * Accept multiple source files with same base name * More resilient with spaces in path names * Improve effects for structures * Keep more comments * More tolerant with __asm(...) * Improve transformers * Improve conflict testing in dependence graph
Par4All 1.3 Par4All is an open-source environment to do source-to-source transformations on C and Fortran programs for parallelizing and optimizing on various targets, from embedded multiprocessor system-on-chip with hardware accelerators up to high performance computer and GPU. Par4All 1.3 is a major version release introducing OpenCL code generation from sequential C programs. It also improves the CUDA code generation by producing up to 3D-kernels and providing better host-GPU communication optimization. More debugging and timing is available now and the thread layout can be more easily changed The generated code is often more efficient than previously because of the general use of loop fusion to improve memory locality. This version is also a component of the HPC Project Scilab compiler and automatic parallelizer. Huge work has been achieved to speed-up the parallelization time. Some benchmarks are now provided with Par4All to measure performance and many bugs as been fixed since previous version. More information can be found on http://par4all.org For more complex transformations, parallelizations and optimizations, contact HPC Project for professional support by sending an email to support at par4all.org
2011-07-07 Ronan Keryell <Ronan.Keryell@hpc-project.com> * Par4All 1.2 * This version targets mainly the Wild Cruncher, a parallelizing environment from HPC Project for Scilab programs. Par4All is used to parallelize the output of the Scilab-to-C compiler from HPC Project * Improved support for CUDA atomic update for reductions * Better deal with scalars in GPU parallelization * Improved memory effect analysis * Fixed outlining for kernel generation with scalar parameters * Improved loop fusion, deal with local variable declarations * Improved array scalarization * Make package publication more resilient to network failures * Fixed GPU code generation for non rectangular iteration spaces * Fixed communication optimization between GPU and CPU * Added support for CEA SCMP embedded system * Installation directory can now be changed also after a first installation * Use the broker concept to deal with stubs to manage with non or already parallelized libraries * Now install LICENSE.txt * Updated to new PyPS interface * GPU kernel can be outlined in separated source files on demand, for OpenCL or use a separate non C99 compiler (CUDA nvcc), at kernel, launcher, wrapper grain... * Fixed compilation flags in PIPS/linear to avoid recompilation to fail when an API changes too much
2011-04-12 Ronan Keryell <Ronan.Keryell@hpc-project.com> * Par4All 1.1.1 * Added support for CEA SCMP task dataflow machine (European project ARTEMIS SCALOPES) * Improved GPU kernel generation for loop nests with complex declarations. Bug reported by Richard Membarth from Universität Erlangen. * Added new options to apply PIPS transformations in the Par4All compilation transit (--apply-before-parallelization...) * Added a programming guide describing best practices to get better performance with Par4All
Par4All 1.1 2011-03-01 Ronan Keryell <Ronan.Keryell@hpc-project.com> * Par4All 1.1 * C99 declarations anywhere in a block and in C99 for-loops are now supported. * Fixed code generation for C99 declarations. * New --apply-before-parallelization option to apply phases before parallelization. * Improved compilation speed. * No longer rely on Python 3.x since there where some issues on some systems to cope with both 2.y and 3.x versions. * Fixed encoding issues.
Par4All 1.0.5 2011-02-03 Ronan Keryell <Ronan.Keryell@hpc-project.com> * Par4All 1.0.5 fixes a bug when a code to be kernelized uses some global variables. Thanks to Sarnath Kannan for this bug report. It should work now on common cases. * Prototype on lazy CUDA communication optimizations to remove redundant host-accelerator communications. * Fixed a space iteration transposition bug in accelerator mode that was killing performances. But right now, better results are obtained with 2D kernels. * C99 for(int i;...;...) are now accepted. * Can generate kernels with less perfectly nested loops. * Updated examples directory to new options and communication optimizations * Better error and warning messages. * Script cleaning by using Python module names everywhere.
Par4All 1.0.4 2010-11-22 Ronan Keryell <Ronan.Keryell@hpc-project.com> * Par4All 1.0.4 introduces a new P4A Accel runtime for OpenMP and CUDA. * In previous months, PIPS and PyPS has evolved a lot, specially in the code generation for various accelerators. This version try to cope with these evolutions. * Added the Stars-pm cosmological N-body simulation program as an example * Now the runtime can deal with subarray transfers between the host and the accelerator, up to 4D arrays. Well right now the phases chosen in PIPS do not use them yet. * The code generation for non-C99 CUDA is more robust.
PreviousNext