readabilityBUNDLE | A bundle of html content extraction algorithms

by srijiths Java Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions Vulnerabilities Install Support

kandi X-RAY | readabilityBUNDLE Summary

readabilityBUNDLE is a Java library. readabilityBUNDLE has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

Preserve the html tags in the extracted content. Keep all the possible images in the content instead of finding best image. Keep all the available videos. Better extraction of li,ul,ol tags. Content normalization of extracted content. Incorporated 3 best popular extraction algorithm , you can choose based on your requirement. Provision to append next pages extracted content and create a consolidated output. Many cleaner / formatter measures added. Some core changes in algorithms. The main challenge which i was facing to extract the main content by keeping all the images / videos / html tags / and some realated div tags which are used as content / non content identification by most of the algorithms. readabilityBUNDLE borrows much code and concepts from [Project Goose] , [Snacktory] and [Java-Readability] My intension was just fine tune / modify the algorithm to work with my requirements. Some html pages works very well in a particular algorithm and some not. This is the main reason i put all the available algorithm under a roof . You can choose an algorithm which best suits you. You can see all author citations in each java file itself.

Support

Quality

Security

License

Reuse

Support

readabilityBUNDLE has a low active ecosystem.

It has 119 star(s) with 40 fork(s). There are 19 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 3 have been closed. On average issues are closed in 6 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of readabilityBUNDLE is current.

Quality

readabilityBUNDLE has 0 bugs and 0 code smells.

Security

readabilityBUNDLE has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

readabilityBUNDLE code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

readabilityBUNDLE does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

readabilityBUNDLE releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed readabilityBUNDLE and discovered the below as its top functions. This is intended to give you an instant insight into readabilityBUNDLE implemented functionality, and help decide if they suit your requirements.

Entry point for testing
Extracts the article
Returns the best node based on clustering
Removes all elements that have a small div or div
Replaces all non - divs of a div with the given tag
Changes the tag of a document element
Converts double brs in a document to a p - tag
Clean out spurious headers from an element
Converts the URL to a URL

Get all kandi verified functions for this library.

readabilityBUNDLE Key Features

No Key Features are available at this moment for readabilityBUNDLE.

readabilityBUNDLE Examples and Code Snippets

No Code Snippets are available at this moment for readabilityBUNDLE.

Community Discussions

No Community Discussions are available at this moment for readabilityBUNDLE.Refer to stack overflow page for discussions.

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install readabilityBUNDLE

Using Maven , mvn clean package.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: