APIfy | Convert data in any static site | REST library
kandi X-RAY | APIfy Summary
kandi X-RAY | APIfy Summary
APIfy converts data in any existing site to JSON API by scraping it.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of APIfy
APIfy Key Features
APIfy Examples and Code Snippets
Community Discussions
Trending Discussions on APIfy
QUESTION
Using the apify-shared npm package, when I try to run a project, I get the error message:
...ANSWER
Answered 2021-May-24 at 13:51Are you using the package standalone? It's not recommended as stated here: https://www.npmjs.com/package/apify-shared
Nevertheless, your issue seems to be that you are not passing required arguments to the library, based on the code in this case you should be calling the logger utility with one of log level strings supported by it:
if (!LEVEL_TO_STRING[options.level]) throw new Error('Options "level" must be one of log.LEVELS enum!');
Hard to say what you should be doing based on the information shared but maybe you recognize you can pass in there one of the logger's levels this way: log.LEVELS.
What are you trying to achieve?
QUESTION
I am creating a new actor in Apify with Cheerio to read an input file of URLs and return primarily two items: (1) the HTTP status code and (2) the HTML title. As part of our process, I would like to be able to try up to 4 variations of each input URL, such as:
- HTTP://WWW.SOMEURL.COM
- HTTPS://WWW.SOMEURL.COM
- HTTP://SOMEURL.COM
- HTTPS://SOMEURL.COM
If one of the 4 variations is successful, then the process should ignore the other variations and move to the next input URL.
I read the original input list into a RequestList, and then would like to create the variations in a RequestQueue. Is this the most efficient way to do it? Please see code below, and thank you!
...ANSWER
Answered 2021-May-10 at 14:40you should create your URL list beforehand. the handlePageFunction
is only used for the actual scraping part, and you should only have the Apify.pushData
there:
QUESTION
We are using the Apify Web Scraper actor to create a URL validation task that returns the input URL, the page's title, and the HTTP response status code. We have a set of 5 test URLs we are using: 4 valid, and 1 non-existent. The successful results are always included in the dataset, but never the failed URL.
Logging indicates that the pageFunction is not even reached for the failed URL:
...ANSWER
Answered 2021-May-05 at 15:30you can use https://sdk.apify.com/docs/typedefs/puppeteer-crawler-options#handlefailedrequestfunction:
you can then push it to the when all retries fail:
QUESTION
here is the scenario, I'm using the cheerio scraper to scraper a website containing real estate announces.
Each announce has the link to the next announce so before scrapint the current page I add the next page in the request queue. What it happens always at certain and a random point is that the scraper stops without any reason, even if in the queue there is the next page to scrape (I add the image).
Why does this happens since there is still a pending request in the queue? Many thanks
Here is the message I get:
...ANSWER
Answered 2021-Mar-02 at 17:11Missing await
await context.enqueueRequest
QUESTION
I am trying to use apify to get a websites title but when I run the code I get error 403, anyone know a fix?
My Code:
...ANSWER
Answered 2021-Feb-15 at 12:45You're using run-sync-get-dataset-items
endpoint, which returns dataset (and only dataset, i.e. the items array, not the run object). Then you're trying to get the items, assuming you're providing defaultDatasetId
, which is undefined
in this case. In the end, you get an error. This would also explain why you see the items via the link.
Not sure why it's error 403, not 404, but I don't see the implementation of getItemsFromDataset(). Could you please check the above first?
QUESTION
I need a little advice, because I am stuck with scraping one web page with Apify. I am using apify/web-scraper and basic scraping is already working (name, description, price etc.), but there are product variants on the page and I have no idea what would be the best method to scrape this data.
The product variant form looks like this:
...ANSWER
Answered 2021-Feb-03 at 04:47you can use the tr
as your item delimiter, so you can then extract from each td
QUESTION
How do I use the features of Apify to generate a full list of URLs for scraping from an index page in which items are added in sequential batches when the user scrolls toward the bottom? In other words, it's dynamic loading/infinite scroll, not operating on a button click.
Specifically, this page - https://www.provokemedia.com/agency-playbook I cannot make it identify any other than the initially-displayed 13 entries.
These elements appear to be at the bottom of each segment, with display: none
changing to display: block
at every segment addition. No "style
" tag here is visible in raw source, only via DevTools Inspector.
ANSWER
Answered 2021-Jan-16 at 07:38@LukášKřivka's answer at How to make the Apify Crawler to scroll full page when web page have infinite scrolling? provides the framework for my answer...
Summary:
- Create a function to instigate force scrolling to the bottom of the page
- Get all elements
Detail:
- In a
while
loop, scroll to the bottom of the page. - Wait for eg. 5 secs for new content to render.
- Keep a running count of the number of target-link selectors, for info.
- Until no more items load.
Call this function only when pageFunction is examining an index page (eg. arbitrary page name like START/LISTING in User Data).
QUESTION
Let's assume a server
...ANSWER
Answered 2020-Dec-21 at 16:40you need to wrap all your code inside Apify.main:
QUESTION
I develop web scraper and I want to integrate Proxy from Netnut into it.
Netnut integration given:Proxy URL: gw.ntnt.io Proxy Port: 5959 Proxy User: igorsavinkin-cc-any Proxy Password: xxxxx
Example Rotating IP format (IP:PORT:USERNAME-CC-COUNTRY:PASSWORD): gw.ntnt.io:5959:igorsavinkin-cc-any:xxxxx
In order to change the country, please change 'any' to your desired country. (US, UK, IT, DE etc.) Available countries: https://l.netnut.io/countries
Our IPs are automatically rotated, if you wish to make them Static Residential, please add a session ID in the username parameter like the example below:
Username-cc-any-sid-any_number
The code:
...ANSWER
Answered 2020-Nov-16 at 15:20Try to use it in this format:
http://username:password@host:port
QUESTION
So I am trying to get elements from JSON of Objects Example JSON Data:
...ANSWER
Answered 2020-Nov-07 at 20:55You don't need to use .get
:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install APIfy
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page