- Name: boilerpipe
- Version: 1.2.0
- Release: 9.mga6
- Group: Development/Java
- License: ASL 2.0
- Url: https://github.com/kohlschutter/boilerpipe
- Summary: Boilerplate Removal and Fulltext Extraction from HTML pages
- Architecture: noarch
- Size: 138430
- Distribution: Mageia
- Vendor: Mageia.Org
- Packager: neoclust <neoclust>
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.
The library already provides specific strategies
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate.
- Cookie: rabbit.mageia.org 1456910148
- Buildhost: rabbit.mageia.org