8000 Doc: Adds example of exploding lists into columns instead of storing in dataframe cells by mgautam98 · Pull Request #23041 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Doc: Adds example of exploding lists into columns instead of storing …
…in dataframe cells
  • Loading branch information
mgautam98 committed Oct 8, 2018
commit 4952597f4a94a4e21a5ec72fd8d5fef2dfa81a5f
36 changes: 8 additions & 28 deletions doc/source/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,10 @@ Example of exploding nested lists into a DataFrame:

.. ipython:: python

dframe = pd.DataFrame({'name': ['A.J. Price'] * 3,
df = pd.DataFrame({'name': ['A.J. Price'] * 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is copied from the following SO article:

https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows

Need to be careful copy / pasting items from SO into the code base. Would have to get express permission from author to use

Copy link
Contributor
@TomAugspurger TomAugspurger Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SO code snippets are CC BY-SA so as long as we link back to the source (which we should be doing anything) then we're good.

https://stackoverflow.com/help/licensing

'opponent': ['76ers', 'blazers', 'bobcats']},
columns=['name','opponent'])
dframe
columns=['name','opponent'])
df

nearest_neighbors = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3
nearest_neighbors
Expand All @@ -358,7 +358,7 @@ Create an index with the "parent" columns to be included in the final Dataframe

.. ipython:: python

df = pd.concat([dframe[['name','opponent']], pd.DataFrame(nearest_neighbors)], axis=1)
df = pd.concat([df[['name','opponent']], pd.DataFrame(nearest_neighbors)], axis=1)
df

Transform the column with lists into series, which become columns in a new Dataframe.
Expand All @@ -385,32 +385,22 @@ Note that at this point we have a Series, not a Dataframe
df = ser.to_frame('nearest_neighbors')
df

All steps in one stack

.. ipython:: python

df = (dframe.concat([df[['name','opponent']], pd.DataFrame(nearest_neighbors)], axis=1)
.set_index(['name', 'opponent'])
.stack()
.reset_index(level=2, drop=True)
.to_frame('nearest_neighbors'))
df

Example of exploding a list embedded in a dataframe:

.. ipython:: python

dframe = pd.DataFrame({'name': ['A.J. Price'] * 3,
df = pd.DataFrame({'name': ['A.J. Price'] * 3,
'opponent': ['76ers', 'blazers', 'bobcats'],
'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3},
columns=['name','opponent','nearest_neighbors'])
dframe
columns=['name','opponent','nearest_neighbors'])
df

Create an index with the "parent" columns to be included in the final Dataframe

.. ipython:: python

df = dframe.set_index(['name', 'opponent'])
df = df.set_index(['name', 'opponent'])
df

Transform the column with lists into series, which become columns in a new Dataframe.
Expand All @@ -437,13 +427,3 @@ Note that at this point we have a Series, not a Dataframe
df = ser.to_frame('nearest_neighbors')
df

All steps in one stack

.. ipython:: python

df = (dframe.set_index(['name', 'opponent'])
.nearest_neighbors.apply(pd.Series)
.stack()
.reset_index(level=2, drop=True)
.to_frame('nearest_neighbors'))
df
0