8000 Tags · Par4All/par4all · GitHub
[go: up one dir, main page]

Skip to content

Tags: Par4All/par4all

Tags

p4a-1.4.5

Toggle p4a-1.4.5's commit message
Merge branch 'p4a-own' into p4a

* p4a-own:
  SIMILAN: several fixes
  SIMILAN: rename .p4a.c into .c and fix function renaming for Cuda and OpenCl.


Former-commit-id: bc9ad46c00b7b94b757eae12c049e83dd2adcf79

p4a-1.4.4

Toggle p4a-1.4.4's commit message
Par4All 1.4.4

Update the include recovery for gcc-4.8 and recent Debian & Ubuntu

Improve documentation

Improve compilation documentation for OpenSuse

Fix const-ness issues in Par4All Accel back-end

Update to latest PIPS version

p4a-1.4.2

Toggle p4a-1.4.2's commit message
Merge branch 'p4a-own' into p4a

Former-commit-id: f44ce8ff199eb4d653d084adf2fd3761d9e75ef4

p4a-1.3.1

Toggle p4a-1.3.1's commit message
Par4All 1.3.1

From the press release :

HPC Project introduces the 1.3.1 version of Par4All open parallelization
platform with automatic code generation of OpenCL, the open standard for
parallel programming of heterogeneous systems. This version brings also
significant additions in the automatic generation of CUDA code resulting
in better execution time.

The changelog :

   * Check for CUDA memory allocation errors

   * Restore PIPS properties after a PyPS exception

   * Manage better pointers on 1D array on GPU

   * Fix loop bound generation in the presence of unsigned index types

   * Fix p4a --report

   * Fix regressions in examples because of PIPS validation
     compatibility.

   * Stars-PM example can work in OpenCL and OpenGL

   * New --no-pointer-aliasing option in p4a

   * --fine option is now --fine-grain. Coarse grain is again the
     default parallelization.

   * Remove parasitic kernel launchers from GPU code with static
     functions

   * Put libraries at the end of link options since it chokes some
     linkers.

   * Fix bugs in loop fusion

   * Fix bugs in PIPS linear around overflow handling

   * Improve array linearization to deal with pointers on array of
     structs, VLA, arrays with static size and others, skip local arrays...

   * Fix bugs in statement isolation, accept structs, deal better
     with partial arrays...

   * Fix bugs in outliner

   * Fix bugs in GPU-ify

   * Describe better the versions in scripts and validation

   * Update package list for recent Ubuntu

   * Update installation guide

   * p4a_scpp is now more user friendly

   * Better linear constraints, improve normalization

   * Improve redundancy reduction in Linear

   * Improve ipyps portability

   * Improve OpenMP reduction pragma when several reductions at the
     same time

   * Fix bug in localize_declaration

   * Improve simplify control

   * Improve p4a_git integration script

   * Improve organization of the Par4All download server directories

   * Fix typo in stub broker

   * Fix P4A Accel runtime to deal with constants in OpenCL kernel
     invocation

   * Differentiate more OpenMP and GPU compilation flows

   * Restructure Par4All Accel Runtime

   * Avoid launching kernels with no iteration

   * Accept multiple source files with same base name

   * More resilient with spaces in path names

   * Improve effects for structures

   * Keep more comments

   * More tolerant with __asm(...)

   * Improve transformers

   * Improve conflict testing in dependence graph

p4a-1.3

Toggle p4a-1.3's commit message
Par4All 1.3

Par4All is an open-source environment to do source-to-source
transformations on C and Fortran programs for parallelizing and optimizing
on various targets, from embedded multiprocessor system-on-chip with
hardware accelerators up to high performance computer and GPU.

Par4All 1.3 is a major version release introducing OpenCL code generation
from sequential C programs.

It also improves the CUDA code generation by producing up to 3D-kernels
and providing better host-GPU communication optimization. More debugging
and timing is available now and the thread layout can be more easily
changed

The generated code is often more efficient than previously because of the
general use of loop fusion to improve memory locality.

This version is also a component of the HPC Project Scilab compiler and
automatic parallelizer. Huge work has been achieved to speed-up the
parallelization time.

Some benchmarks are now provided with Par4All to measure performance and
many bugs as been fixed since previous version.

More information can be found on http://par4all.org

For more complex transformations, parallelizations and optimizations,
contact HPC Project for professional support by sending an email to
support at par4all.org

p4a-1.2

Toggle p4a-1.2's commit message
2011-07-07 Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.2

  * This version targets mainly the Wild Cruncher, a parallelizing
    environment from HPC Project for Scilab programs. Par4All is used
    to parallelize the output of the Scilab-to-C compiler from HPC
    Project

  * Improved support for CUDA atomic update for reductions

  * Better deal with scalars in GPU parallelization

  * Improved memory effect analysis

  * Fixed outlining for kernel generation with scalar parameters

  * Improved loop fusion, deal with local variable declarations

  * Improved array scalarization

  * Make package publication more resilient to network failures

  * Fixed GPU code generation for non rectangular iteration spaces

  * Fixed communication optimization between GPU and CPU

  * Added support for CEA SCMP embedded system

  * Installation directory can now be changed also after a first
    installation

  * Use the broker concept to deal with stubs to manage with non or
    already parallelized libraries

  * Now install LICENSE.txt

  * Updated to new PyPS interface

  * GPU kernel can be outlined in separated source files on demand,
    for OpenCL or use a separate non C99 compiler (CUDA nvcc), at
    kernel, launcher, wrapper grain...

  * Fixed compilation flags in PIPS/linear to avoid recompilation to
    fail when an API changes too much

p4a-1.1.1

Toggle p4a-1.1.1's commit message
2011-04-12 Ronan Keryell <Ronan.Keryell@hpc-project.com>

	    * Par4All 1.1.1

		* Added support for CEA SCMP task dataflow machine (European
		  project ARTEMIS SCALOPES)

		* Improved GPU kernel generation for loop nests with complex
		  declarations.

		  Bug reported by Richard Membarth from Universität Erlangen.

		* Added new options to apply PIPS transformations in the Par4All
		  compilation transit (--apply-before-parallelization...)

		* Added a programming guide describing best practices to get
		  better performance with Par4All

p4a-1.1

Toggle p4a-1.1's commit message
Par4All 1.1

2011-03-01  Ronan Keryell <Ronan.Keryell@hpc-project.com>

        * Par4All 1.1

        * C99 declarations anywhere in a block and in C99 for-loops are now
          supported.

        * Fixed code generation for C99 declarations.

        * New --apply-before-parallelization option to apply phases before
          parallelization.

        * Improved compilation speed.

        * No longer rely on Python 3.x since there where some issues on
          some systems to cope with both 2.y and 3.x versions.

        * Fixed encoding issues.

p4a-1.0.5

Toggle p4a-1.0.5's commit message
Par4All 1.0.5

2011-02-03  Ronan Keryell <Ronan.Keryell@hpc-project.com>

		* Par4All 1.0.5 fixes a bug when a code to be kernelized uses some
		  global variables.

		  Thanks to Sarnath Kannan for this bug report. It should work now
		  on common cases.

		* Prototype on lazy CUDA communication optimizations to remove
		  redundant host-accelerator communications.

		* Fixed a space iteration transposition bug in accelerator mode
		  that was killing performances. But right now, better results are
		  obtained with 2D kernels.

		* C99 for(int i;...;...) are now accepted.

		* Can generate kernels with less perfectly nested loops.

		* Updated examples directory to new options and communication
		  optimizations

		* Better error and warning messages.

		* Script cleaning by using Python module names everywhere.

p4a-1.0.4

Toggle p4a-1.0.4's commit message
Par4All 1.0.4

2010-11-22  Ronan Keryell <Ronan.Keryell@hpc-project.com>

		* Par4All 1.0.4 introduces a new P4A Accel runtime for OpenMP and CUDA.

		* In previous months, PIPS and PyPS has evolved a lot, specially
		  in the code generation for various accelerators. This version
		  try to cope with these evolutions.

		* Added the Stars-pm cosmological N-body simulation program as an
		  example

		* Now the runtime can deal with subarray transfers between the
		  host and the accelerator, up to 4D arrays. Well right now the
		  phases chosen in PIPS do not use them yet.

		* The code generation for non-C99 CUDA is more robust.
0