pubMunch | various tools to download , convert and process the full text
kandi X-RAY | pubMunch Summary
kandi X-RAY | pubMunch Summary
pubMunch is a Python library. pubMunch has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.
NOTE: There is a Python3 version of this repo now - Ongoing dev work is happening over there. These are the tools that I wrote for the UCSC Genocoding project, see They allow you to download fulltext research articles from the internet, convert them to text and run text mining algorithms on them. All tools start with the prefix "pub".
NOTE: There is a Python3 version of this repo now - Ongoing dev work is happening over there. These are the tools that I wrote for the UCSC Genocoding project, see They allow you to download fulltext research articles from the internet, convert them to text and run text mining algorithms on them. All tools start with the prefix "pub".
Support
Quality
Security
License
Reuse
Support
pubMunch has a low active ecosystem.
It has 48 star(s) with 21 fork(s). There are 8 watchers for this library.
It had no major release in the last 6 months.
There are 1 open issues and 5 have been closed. On average issues are closed in 315 days. There are 1 open pull requests and 0 closed requests.
It has a neutral sentiment in the developer community.
The latest version of pubMunch is current.
Quality
pubMunch has no bugs reported.
Security
pubMunch has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
pubMunch does not have a standard license declared.
Check the repository for any license declaration and review the terms closely.
Without a license, all rights are reserved, and you cannot use the library in your applications.
Reuse
pubMunch releases are not available. You will need to build from source code and install.
Build file is available. You can build the component from source.
Installation instructions, examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi has reviewed pubMunch and discovered the below as its top functions. This is intended to give you an instant insight into pubMunch implemented functionality, and help decide if they suit your requirements.
- Decorator to log the phase of each token .
- Create a configuration dictionary for highwire publishers .
- Get stylesheet .
- Return an Element Builder .
- Creates a DOM builder .
- Compile regex patterns
- Parse Elsevier metadata .
- returns a dictionary of publication counts
- Crawl files via Pubmed .
- Parse the NLM XML file .
Get all kandi verified functions for this library.
pubMunch Key Features
No Key Features are available at this moment for pubMunch.
pubMunch Examples and Code Snippets
No Code Snippets are available at this moment for pubMunch.
Community Discussions
No Community Discussions are available at this moment for pubMunch.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install pubMunch
Install these packages on ubuntu: sudo apt-get install catdoc poppler-utils docx2text gnumeric python-lxml.
catdoc contains various converters for Microsoft Office files
poppler-utils contains one of the pdftotext converters
docx2text is a perl script for docx files
gnumeric includes the ssconvert tools for xslx Excel files
python-lxml is a fast xml/html parser
html2text is required, used for the html -> text conversion (written by Aaron Schwartz)
requests is very useful for pubCraw2 and highly recommended
selenium is only be optionally used to crawl karger journals. Not required.
catdoc contains various converters for Microsoft Office files
poppler-utils contains one of the pdftotext converters
docx2text is a perl script for docx files
gnumeric includes the ssconvert tools for xslx Excel files
python-lxml is a fast xml/html parser
html2text is required, used for the html -> text conversion (written by Aaron Schwartz)
requests is very useful for pubCraw2 and highly recommended
selenium is only be optionally used to crawl karger journals. Not required.
Support
fixme: illegal DOI landing page http://www.nature.com/doifinder/10.1046/j.1523-1747.1998.00092.x. URL constructor: http://www.nature.com/nature/journal/v437/n7062/full/4371102a.html for DOI doi:10.1038/4371102a. URL construction for supplemental files: http://www.nature.com/bjc/journal/v103/n10/suppinfo/6605908s1.html. no access page: http://www.nature.com/nrclinonc/journal/v7/n11/full/nrclinonc.2010.119.html. cat /cluster/home/max/projects/pubs/crawlDir/rupress/articleMeta.tab | head -n13658 | tail -n2 > problem.txt.
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page