kandi X-RAY | 51job Summary
kandi X-RAY | 51job Summary
前程无忧(51job)招聘信息爬取
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Main entry point
- Get data from db xml
- Get list of jobs from url
- Gets HTML response
- Gets the job name
- Set the company name
- Sets the job key
- Sets the job name
- Set the push date
- Set the salary
- Set the work address
51job Key Features
51job Examples and Code Snippets
Community Discussions
Trending Discussions on 51job
QUESTION
I am trying to build a database with rvest. Since I have much data to download, I tried to write several functions that would allow me to interrupt the scraping process and to restart it where I left it. However, while the functions work more or less, whenever I manually interrupt them, I loose the output. Does anyone know a solution that would allow me to stop the function without loosing the dataframe that the loop is building ? I would be glad for any advice!
Some urls that I am trying to scrape data from:
...ANSWER
Answered 2020-Feb-19 at 21:54I come across this problem often in webscraping. The key is to store the intermediate results in an environment where they are accessible if your function throws an error. The obvious place is the global environment, but this depends on how you are using your function. If it is part of a package, then you don't want to write to the global workspace. In that case you can have a "storage" environment as part of the package.
Perhaps the neatest way to do this is to delete the intermediate object after the loop is complete, so it will only ever be visible / accessible if the loop throws an error.
Here is a function that demonstrates the principle:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install 51job
You can use 51job like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the 51job component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page