pubMunch | various tools to download , convert and process the full text

 by   maximilianh Python Version: Current License: No License

kandi X-RAY | pubMunch Summary

kandi X-RAY | pubMunch Summary

pubMunch is a Python library. pubMunch has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

NOTE: There is a Python3 version of this repo now - Ongoing dev work is happening over there. These are the tools that I wrote for the UCSC Genocoding project, see They allow you to download fulltext research articles from the internet, convert them to text and run text mining algorithms on them. All tools start with the prefix "pub".
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              pubMunch has a low active ecosystem.
              It has 48 star(s) with 21 fork(s). There are 8 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 1 open issues and 5 have been closed. On average issues are closed in 315 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of pubMunch is current.

            kandi-Quality Quality

              pubMunch has no bugs reported.

            kandi-Security Security

              pubMunch has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              pubMunch does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              pubMunch releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed pubMunch and discovered the below as its top functions. This is intended to give you an instant insight into pubMunch implemented functionality, and help decide if they suit your requirements.
            • Decorator to log the phase of each token .
            • Create a configuration dictionary for highwire publishers .
            • Get stylesheet .
            • Return an Element Builder .
            • Creates a DOM builder .
            • Compile regex patterns
            • Parse Elsevier metadata .
            • returns a dictionary of publication counts
            • Crawl files via Pubmed .
            • Parse the NLM XML file .
            Get all kandi verified functions for this library.

            pubMunch Key Features

            No Key Features are available at this moment for pubMunch.

            pubMunch Examples and Code Snippets

            No Code Snippets are available at this moment for pubMunch.

            Community Discussions

            No Community Discussions are available at this moment for pubMunch.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install pubMunch

            Install these packages on ubuntu: sudo apt-get install catdoc poppler-utils docx2text gnumeric python-lxml.
            catdoc contains various converters for Microsoft Office files
            poppler-utils contains one of the pdftotext converters
            docx2text is a perl script for docx files
            gnumeric includes the ssconvert tools for xslx Excel files
            python-lxml is a fast xml/html parser
            html2text is required, used for the html -> text conversion (written by Aaron Schwartz)
            requests is very useful for pubCraw2 and highly recommended
            selenium is only be optionally used to crawl karger journals. Not required.

            Support

            fixme: illegal DOI landing page http://www.nature.com/doifinder/10.1046/j.1523-1747.1998.00092.x. URL constructor: http://www.nature.com/nature/journal/v437/n7062/full/4371102a.html for DOI doi:10.1038/4371102a. URL construction for supplemental files: http://www.nature.com/bjc/journal/v103/n10/suppinfo/6605908s1.html. no access page: http://www.nature.com/nrclinonc/journal/v7/n11/full/nrclinonc.2010.119.html. cat /cluster/home/max/projects/pubs/crawlDir/rupress/articleMeta.tab | head -n13658 | tail -n2 > problem.txt.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/maximilianh/pubMunch.git

          • CLI

            gh repo clone maximilianh/pubMunch

          • sshUrl

            git@github.com:maximilianh/pubMunch.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link