8000 read_html: Handle colspan and rowspan by adamhooper · Pull Request #21487 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

read_html: Handle colspan and rowspan #21487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 5, 2018
Merged
Prev Previous commit
fixup whatsnew
  • Loading branch information
jreback committed Jul 5, 2018
commit 5fd863bb3611093aefcda7e0f16573d77a3190d4
80 changes: 41 additions & 39 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ New features

- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)

.. _whatsnew_0240.enhancements.extension_array_operators
.. _whatsnew_0240.enhancements.extension_array_operators:

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -26,6 +26,46 @@ See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

.. _whatsnew_0240.enhancements.read_html:

``read_html`` Enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes.
Now it understands them, treating them as sequences of cells with the same
value. (:issue:`17054`)

.. ipython:: python

result = pd.read_html("""
<table>
<thead>
<tr>
<th>A</th><th>B</th><th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="2">1</td><td>2</td>
</tr>
</tbody>
</table>""")

Previous Behavior:

.. code-block:: ipython

In [13]: result
Out [13]:
[ A B C
0 1 2 NaN]

Current Behavior:

.. ipython:: python

result

.. _whatsnew_0240.enhancements.other:

Other Enhancements
Expand Down Expand Up @@ -174,44 +214,6 @@ Current Behavior:
...
OverflowError: Trying to coerce negative values to unsigned integers

read_html Enhancements
^^^^^^^^^^^^^^^^^^^^^^

:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes.
Now it understands them, treating them as sequences of cells with the same
value. (:issue:`17054`)

.. ipython:: python

result = pd.read_html("""
<table>
<thead>
<tr>
<th>A</th><th>B</th><th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="2">1</td><td>2</td>
</tr>
</tbody>
</table>""")

Previous Behavior:

.. code-block:: ipython

In [13]: result
Out [13]:
[ A B C
0 1 2 NaN]

Current Behavior:

.. ipython:: python

result

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
0