DataSpider | A spider library of several data sources | Crawler library
kandi X-RAY | DataSpider Summary
kandi X-RAY | DataSpider Summary
A spider framework with several internal spiders.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Try to read a page
- Reads a blog
- Return a BeautifulSoup object for the given url
- Perform a GET request
- List of floor objects
- Parse a user tag
- Parse an author
- Parse a floor
- Get information about a movie
- Get data by ID
- Returns a dictionary containing information about a book
- Return a generator of forums
- Create a default requests session
- Download the image
- Return a list of comments
- Episode title
- Return all comments
- Serialize to JSON
- Returns the size of the video
- Returns the response content
- Set proxy information
- Get download link
- Return thread urls
- Download the URL
- Set the cookie jar
- Return a list of urls
DataSpider Key Features
DataSpider Examples and Code Snippets
Community Discussions
Trending Discussions on DataSpider
QUESTION
I'm trying to create a scrapy script with the intent on gaining information on individual posts on the medium website. Now, unfortunately, it requires 3 depths of links. Each year link, and each month within that year and then each day within the months links.
I've got as far as managing to get each individual link for every year, every month in that year and every day. However I just can't seem to get scrapy to deal with the individual day pages.
I'm not entirely sure whether I'm confusing using rules and using functions with callbacks to get the links. There isn't much guidance on how to recursively deal with this type of pagination. I've tried using functions and response.follow by itself without being able to get it to run.
The parse_item function dictionary is required because several articles on the individual day pages have several different ways of classifying the title annoyingly. So i created a function to grab the title regardless of the actual XPATH needed to grab the title.
The last function get_tag is needed because on each individual article that is where the tags are to grab.
I'd appreciate any insight into how to get the last step and getting the individual links to go through the parse_item function, the shell o. I should say there are no obvious errors than I can see in the shell.
Any further information necessary just let me know.
Thanks!
CODE:
...ANSWER
Answered 2020-Feb-07 at 17:37remove the three functions years,months,days
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install DataSpider
You can use DataSpider like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page