jusText
Heuristic tool for extracting meaningful text from HTML pages. Visit Website

What is jusText?

jusText is a tool that uses heuristics to eliminate boilerplate content, including navigation links, headers, and footers, from HTML pages. It focuses on extracting text that consists of complete sentences, making it perfect for creating linguistic resources like web corpora. The tool prioritizes maintaining meaningful text and provides an online platform to test its functionality.