8000 Redesign DataFrame structure by akharche · Pull Request #815 · IntelPython/sdc · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Redesign DataFrame structure #815

Closed
wants to merge 1 commit into from

Conversation

akharche
Copy link
Contributor
@akharche akharche commented Apr 21, 2020

Extension for #801

  • Implementation of new DataFrame structure based on lists instead of tuples
  • Improved df.count() codegen for testing

Example:

df = pd.DataFrame({'A': [1,2,3], 'B': [.5, .6, .7], 'C': [4, 5, 6], 'D': ['a', 'b', 'c']})

(['A', 'B', 'C', 'D'],)
([array([1, 2, 3], dtype=int64), array([4, 5, 6], dtype=int64)], [array([0.5, 0.6, 0.7])], [array(['a', 'b', 'c'], dtype=object)])

Reproduce:

@njit
def run_df():
    df = pd.DataFrame({'A': [1,2,3], 'B': [.5, .6, .7], 'C': [4, 5, 6], 'D': ['a', 'b', 'c']})

    print(df._columns)
    print(df._data)

    return df.count()

if col_typ not in data_typs_map:
data_typs_map[col_typ] = (type_id, [col_id])
# The first column in each type always has 0 index
df_structure[col_name] = (type_id, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we could use named tuple?

super(DataFrameType, self).__init__(
name="dataframe({}, {}, {}, {})".format(data, index, columns, has_parent))
name="dataframe({}, {}, {}, {}, {})".format(data, index, columns, has_parent, df_structure))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want structure to be part of type name?

('index', fe_type.index),
('columns', types.UniTuple(string_type, n_cols)),
('columns', types.UniTuple(types.List(string_type), 1)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just list?

('parent', types.pyobject),
('df_structure', types.pyobject),
Co 8000 py link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it here?

@akharche
Copy link
Contributor Author

Duplicate of #817

@akharche akharche closed this Apr 23, 2020
@akharche akharche deleted the change_df_structure branch April 23, 2020 14:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0