webarticle2text | main article text from an arbitrary webpage

 by   chrisspen HTML Version: 3.0.2 License: LGPL-3.0

kandi X-RAY | webarticle2text Summary

kandi X-RAY | webarticle2text Summary

webarticle2text is a HTML library. webarticle2text has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has low support. You can download it from GitHub.

This project is obsolete and now only serves as a reference. I recommend you instead use newspaper, which is an order-of-magnitude more accurate than any other article extraction library I've encountered. Please see compare.csv for a performance comparison of several similar tools. This attempts to locate and extract the largest cluster of text in a webpage. It does this by walking the DOM-tree, identifying all text segments and their depth inside the DOM, appends all text at roughly the same depth, and then returns the chunk with the largest total length. This approach usually works well with typical news sites where one news article is displayed per URL. This approach usually fails with URLs displaying multiple news blurbs (e.g. news aggregators).
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              webarticle2text has a low active ecosystem.
              It has 83 star(s) with 16 fork(s). There are 8 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 0 open issues and 4 have been closed. On average issues are closed in 30 days. There are 12 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of webarticle2text is 3.0.2

            kandi-Quality Quality

              webarticle2text has no bugs reported.

            kandi-Security Security

              webarticle2text has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              webarticle2text is licensed under the LGPL-3.0 License. This license is Weak Copyleft.
              Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.

            kandi-Reuse Reuse

              webarticle2text releases are not available. You will need to build from source code and install.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of webarticle2text
            Get all kandi verified functions for this library.

            webarticle2text Key Features

            No Key Features are available at this moment for webarticle2text.

            webarticle2text Examples and Code Snippets

            No Code Snippets are available at this moment for webarticle2text.

            Community Discussions

            No Community Discussions are available at this moment for webarticle2text.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install webarticle2text

            You may need to install the tidylib system package, which you can get on Ubuntu 12.04 using:.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install webarticle2text

          • CLONE
          • HTTPS

            https://github.com/chrisspen/webarticle2text.git

          • CLI

            gh repo clone chrisspen/webarticle2text

          • sshUrl

            git@github.com:chrisspen/webarticle2text.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link