JobHarvester | 各大高校并行收集job信息 对job信息进行过滤(包括想要,不想要)
kandi X-RAY | JobHarvester Summary
kandi X-RAY | JobHarvester Summary
JobHarvester is a Java library. JobHarvester has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.
各大高校并行收集job信息 对job信息进行过滤(包括想要,不想要) 支持扩展爬虫,只需继承抽象类并实现少量函数(定位列表标签等) 利用类加载器加载配置文件中不同高校的爬虫。 抽取job信息中的格式,解析成统一的bean 结果以简单地html页面展示。 目前涉及到的技术点:. 主要包括利用spring解耦; mybatis处理数据库对象映射; 线程池并发收集; 类加载器动态加载爬虫。 系统架构设计:. 程序入口两个: 一个是crawlerwriter,用于爬虫和写数据库 一个是crawlerread,用于读取数据库和显示 还有几个重要的类介绍如下: crawlerservice:利用线程池并发收集不同高校的就业信息。 mycrawler:爬虫抽象类,模板模式,扩展爬虫必须继承的父类,包含了单爬虫爬取的算法 jobinfoservice:就业信息与数据库交互的服务 类图依赖关系如图所示:. 若要扩展爬虫只需三步: 只需继承mycrawler 实现部分抽象方法,包括定位行,列,信息的索引,下一页等。 public class nyucrawler extends mycrawler {. 在school.properties中配置对应的类名与网站的链接。 nju = njucs = sju = nyu = 初始化: 只需在配置文件中配置类名和url,系统会利用类加载器自动加载所有的爬虫类并放到一个hashmap中。 /** * 使用类加载器加载各个爬虫类 * 在school.properties中配置对于url * 读取学校链接匹配地址,命名规范[school]+[crawer] */ @postconstruct public void init() { properties properties = new properties(); classloader classloader = thread.currentthread().getcontextclassloader(); try { properties.load(classloader.getresourceasstream("school.properties")); } catch (ioexception e) { e.printstacktrace(); }. 单个爬虫算法初稿很简单如下,设计成一个模板模式: 红色均为抽象函数,不同的爬虫需要重写。 初始化:包括打开浏览器,禁用图片,css 进行连接 循环页数: 获得粗糙的目标(定位table) 定位行 过滤想要和不想要的信息 抽取jobinfo模型 下一页 /** * 模板模式算法 * 粗糙定位目标位置:比如table * 过滤想要和不想要的信息 * 定位行位置 * 抽取封装jobinfo信息 * 调用服务进行处理(此处可异步) * 下一页
各大高校并行收集job信息 对job信息进行过滤(包括想要,不想要) 支持扩展爬虫,只需继承抽象类并实现少量函数(定位列表标签等) 利用类加载器加载配置文件中不同高校的爬虫。 抽取job信息中的格式,解析成统一的bean 结果以简单地html页面展示。 目前涉及到的技术点:. 主要包括利用spring解耦; mybatis处理数据库对象映射; 线程池并发收集; 类加载器动态加载爬虫。 系统架构设计:. 程序入口两个: 一个是crawlerwriter,用于爬虫和写数据库 一个是crawlerread,用于读取数据库和显示 还有几个重要的类介绍如下: crawlerservice:利用线程池并发收集不同高校的就业信息。 mycrawler:爬虫抽象类,模板模式,扩展爬虫必须继承的父类,包含了单爬虫爬取的算法 jobinfoservice:就业信息与数据库交互的服务 类图依赖关系如图所示:. 若要扩展爬虫只需三步: 只需继承mycrawler 实现部分抽象方法,包括定位行,列,信息的索引,下一页等。 public class nyucrawler extends mycrawler {. 在school.properties中配置对应的类名与网站的链接。 nju = njucs = sju = nyu = 初始化: 只需在配置文件中配置类名和url,系统会利用类加载器自动加载所有的爬虫类并放到一个hashmap中。 /** * 使用类加载器加载各个爬虫类 * 在school.properties中配置对于url * 读取学校链接匹配地址,命名规范[school]+[crawer] */ @postconstruct public void init() { properties properties = new properties(); classloader classloader = thread.currentthread().getcontextclassloader(); try { properties.load(classloader.getresourceasstream("school.properties")); } catch (ioexception e) { e.printstacktrace(); }. 单个爬虫算法初稿很简单如下,设计成一个模板模式: 红色均为抽象函数,不同的爬虫需要重写。 初始化:包括打开浏览器,禁用图片,css 进行连接 循环页数: 获得粗糙的目标(定位table) 定位行 过滤想要和不想要的信息 抽取jobinfo模型 下一页 /** * 模板模式算法 * 粗糙定位目标位置:比如table * 过滤想要和不想要的信息 * 定位行位置 * 抽取封装jobinfo信息 * 调用服务进行处理(此处可异步) * 下一页
Support
Quality
Security
License
Reuse
Support
JobHarvester has a low active ecosystem.
It has 1 star(s) with 0 fork(s). There are 1 watchers for this library.
It had no major release in the last 12 months.
JobHarvester has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of JobHarvester is v0.1-beta
Quality
JobHarvester has no bugs reported.
Security
JobHarvester has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
JobHarvester does not have a standard license declared.
Check the repository for any license declaration and review the terms closely.
Without a license, all rights are reserved, and you cannot use the library in your applications.
Reuse
JobHarvester releases are available to install and integrate.
Build file is available. You can build the component from source.
Installation instructions are not available. Examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi has reviewed JobHarvester and discovered the below as its top functions. This is intended to give you an instant insight into JobHarvester implemented functionality, and help decide if they suit your requirements.
- Get job info
- Initialize CSS
- Consume job info
- Insert job info
- Entry point for the crawler writer
- Sorts by the specified map
- Get all the jobs from the cache
- Get JobInfo by map
- Continue Craw job info
- Produces a job info from crawler
- Extract JobInfo DTO from WebElement
- Extract job info from web element
- Get job infos by date
- Initialize
- Get all the jobs from memory
- Gets the avaliable comanys
- Creates the table if it doesn t exist
- Initialize Redis
- Add a cache
- Converts the web element into a JobInfo DTO
- Log execution time
- Get JobInfo DTO
- Checks if the given content is an attention
- Get cached job infos
- Extracts the JobInfo DTO from WebElement
- End timing for a tag
Get all kandi verified functions for this library.
JobHarvester Key Features
No Key Features are available at this moment for JobHarvester.
JobHarvester Examples and Code Snippets
No Code Snippets are available at this moment for JobHarvester.
Community Discussions
No Community Discussions are available at this moment for JobHarvester.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install JobHarvester
You can download it from GitHub.
You can use JobHarvester like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the JobHarvester component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
You can use JobHarvester like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the JobHarvester component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page