html5lib
Python library for accurate HTML parsing, supports error recovery and multiple tree formats. Visit Website

What is html5lib?

html5lib is a Python library designed to parse HTML documents according to the WHATWG HTML specification, ensuring compatibility with major web browsers. It excels in handling malformed HTML and performing error recovery. The library supports various tree formats, such as xml.etree, xml.dom.minidom, and lxml.etree. While it prioritizes accuracy over speed, html5lib is ideal for tasks that need precise HTML representation.