8000 pvlib.iam.marion_integrate uses too much memory for vector inputs · Issue #1402 · pvlib/pvlib-python · GitHub
[go: up one dir, main page]

Skip to content
pvlib.iam.marion_integrate uses too much memory for vector inputs #1402
Open
@kandersolar

Description

@kandersolar

pvlib.iam.marion_integrate (which is mostly relevant as a helper for pvlib.iam.marion_diffuse) needs quite a bit of memory when passed vector inputs. An input of length 1000 allocates around 2GB of memory on my machine, so naively passing in a standard 8760 would use roughly 17-18 GB. Unfortunately I was very much focused on fixed tilt simulations when I wrote pvlib's implementation and never tried it out on large vector inputs, so this problem went unnoticed until @spaneja pointed it out to me.

I think any vectorized implementation of this algorithm is going to be rather memory-heavy, so I'm skeptical that achieving even a factor of 10 reduction in memory usage is possible here without completely changing the approach (and likely shifting the burden from memory to CPU). However, here are two low-hanging fruits worth considering:

  1. The current implementation has a handful of large 2-D arrays local to the function that only get released when the function returns. Some of them are only used near the beginning of the function but still take up memory for the entire function duration. Using the del statement to instruct python that those arrays are no longer needed allows python to reclaim that memory immediately and recycle it for subsequent allocations. This is probably a simplification of what actually happens, but it seems consistent with the below observations.
  2. np.float32 cuts memory usage in half compared with np.float64 and (probably) doesn't meaningfully change the result. It's not like surface_tilt has more than a few sig figs anyway.

Here is a rough memory and timing comparison (using memory_profiler, very handy). pvlib is the current implementation; the two del variants use a strategic sprinkling of del but are otherwise not much different from pvlib. This is for an input of length 1000. The traces here are memory usage sampled at short intervals across a single function invocation; for example the blue pvlib trace shows that the function call took 1.4 seconds to complete and had a peak memory usage slightly higher than 2GB.

image

So using a few dels cuts peak memory usage roughly in half. Dropping down to np.float32 cuts it roughly in half again (and gives a nontrivial speedup too). It's possible that further improvements can be had with other tricks (e.g. using the out parameter that some numpy functions provide) but I've not yet explored them.

My main question: are we open to using these two strategies in pvlib? Despite being built into python itself, del still seems unpythonic to me for some reason. Switching away from float64 is objectionable to the extent that it's the standard in scientific computing and is therefore baked into the models by assumption. I think I'm cautiously open to both of the above approaches, iff they are accompanied by good explanatory comments and switching to float32 can be reasonably shown to not introduce a meaningful difference in output.


Remark: even ignoring this memory bloat, I tend to think that applying marion_integrate directly to an 8760 is a bit strange. In simulations with time series surface_tilts, a better approach IMHO is to calculate the IAM values only for np.linspace(0, 90, 1) or similar and use pvlib.iam.interp to generate the 8760 IAM series. If nothing else, we might suggest that in the docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0