8000 Variable deletion consumes a lot of memory · Issue #17092 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content
Variable deletion consumes a lot of memory #17092
@ivallesp

Description

@ivallesp

Hi team,

I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am getting rid of part of the data, but it still needs to allocate a big amount of memory producing MemoryErrors.

Just to give you a little bit of context, I am working with a DataFrame which contains 33M of rows and 500 columns (just a big one!), almost all of them numeric, in a machine with 360GB of RAM. The whole data set fits in memory and I can successfully apply some transformations to the variables. The problem comes when I need to drop a 10% of the columns contained in the table. It just produces a big peak of memory leading to a MemoryError. Before performing this operation, there are more than 80GB of memory available!.

I tried to use the following methods for removing the columns and all of them failed.

  • drop() with or without inplace parameter
  • pop()
  • reindex()
  • reindex_axis()
  • del df[column] in a loop over the columns to be removed
  • __delitem__(column) in a loop over the columns to be removed
  • pop() and drop() in a loop over the columns to be removed.
  • I also tried to reasign the columns overwritting the data frame using indexing with loc() and iloc() but it does not help.

I found that the drop method with inplace is the most efficient one but it still generates a huge peak.

I would like to discuss if there is there any way of implementing (or is it already implemented by any chance) a method for more efficiently removing variables without generating more memory consumption...

Thank you
Iván

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0