trafilatura
Web text extraction tool for efficient crawling and content extraction. Visit Website

What is trafilatura?

Trafilatura is a Python package and command-line tool designed for web text extraction. It is used for web crawling, downloading, scraping, and extracting main texts, metadata, and comments. Trafilatura is modular and does not require a database, allowing outputs to be converted to various formats like TXT, JSON, and XML. Widely adopted by companies and institutions, it is recognized for its efficiency and accuracy in extracting web content.