html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
Simple usage follows this pattern:
import html5lib
with open("mydocument.html", "rb") as f:
document = html5lib.parse(f)
or:
import html5lib
document = html5lib.parse("<p>Hello World!")