webarticle2text | main article text from an arbitrary webpage
kandi X-RAY | webarticle2text Summary
kandi X-RAY | webarticle2text Summary
webarticle2text is a HTML library. webarticle2text has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has low support. You can download it from GitHub.
This project is obsolete and now only serves as a reference. I recommend you instead use newspaper, which is an order-of-magnitude more accurate than any other article extraction library I've encountered. Please see compare.csv for a performance comparison of several similar tools. This attempts to locate and extract the largest cluster of text in a webpage. It does this by walking the DOM-tree, identifying all text segments and their depth inside the DOM, appends all text at roughly the same depth, and then returns the chunk with the largest total length. This approach usually works well with typical news sites where one news article is displayed per URL. This approach usually fails with URLs displaying multiple news blurbs (e.g. news aggregators).
This project is obsolete and now only serves as a reference. I recommend you instead use newspaper, which is an order-of-magnitude more accurate than any other article extraction library I've encountered. Please see compare.csv for a performance comparison of several similar tools. This attempts to locate and extract the largest cluster of text in a webpage. It does this by walking the DOM-tree, identifying all text segments and their depth inside the DOM, appends all text at roughly the same depth, and then returns the chunk with the largest total length. This approach usually works well with typical news sites where one news article is displayed per URL. This approach usually fails with URLs displaying multiple news blurbs (e.g. news aggregators).
Support
Quality
Security
License
Reuse
Support
webarticle2text has a low active ecosystem.
It has 83 star(s) with 16 fork(s). There are 8 watchers for this library.
It had no major release in the last 12 months.
There are 0 open issues and 4 have been closed. On average issues are closed in 30 days. There are 12 open pull requests and 0 closed requests.
It has a neutral sentiment in the developer community.
The latest version of webarticle2text is 3.0.2
Quality
webarticle2text has no bugs reported.
Security
webarticle2text has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
webarticle2text is licensed under the LGPL-3.0 License. This license is Weak Copyleft.
Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.
Reuse
webarticle2text releases are not available. You will need to build from source code and install.
Installation instructions, examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of webarticle2text
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of webarticle2text
webarticle2text Key Features
No Key Features are available at this moment for webarticle2text.
webarticle2text Examples and Code Snippets
No Code Snippets are available at this moment for webarticle2text.
Community Discussions
No Community Discussions are available at this moment for webarticle2text.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install webarticle2text
You may need to install the tidylib system package, which you can get on Ubuntu 12.04 using:.
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page