facebot | A facebook profile and reconnaissance system | Crawler library
kandi X-RAY | facebot Summary
kandi X-RAY | facebot Summary
A facebook automated profile and reconnaissance system.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- get friends list
- Parse argparse .
- This function is used to extract data from the cargo
- initiate server
- add fbids to db
- send a random fbids
- handle start tag
- Generate a random link .
- returns the content of the comic
- Execute a SQL statement
facebot Key Features
facebot Examples and Code Snippets
Community Discussions
Trending Discussions on facebot
QUESTION
--------- Note (Edit) - I might be doing this completely wrong, any guidance would be appreciated if this is in fact wrong (New to mvc)
In the solution a robots.txt file exists to block all crawlers from the site. The only problem with this is, is Facebooks crawler/scraper is not following the rules and are still crawling/scraping the site and causing an error to log and email every couple of minutes. The error being sent for this is "A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'."
The solution for this is to create a filter on the Controllers to check the agent name. If the agent name is for facebook then redirect them to a "No Robots authentication page". The filter has to be on the controller due to the site catering for 3 different routes where each has a custom link and customers has access to the direct links which gets shared on facebook (thus creating a route for this in the route config will not work).
The problem I'm facing is that the solution is not redirecting immediate on the controller filter. It's acceding Action methods (These action methods are Partial Pages) and then fails due to not being able to redirect (the view already started rendering then - which is correct). Is there a way to redirect immediately on the first time when this filter is accessed? Or is there maybe a better solution to this?
To test and troubleshoot I am changing the user agent in code to match what is logged. The error when redirecting from the filter: "Child actions are not allowed to perform redirect actions."
The Error that is currently logged due to Facebook's crawler: " A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'. "
This is what I've done:
Custom Filter:
...ANSWER
Answered 2020-Oct-15 at 18:08This is because you're using an ActionFilterAttribute
. If you check the documentation here: https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-3.1 it explains the filter lifecycle and basically - by the time you arrive to action filters, it's too late. You need an authorization filter or a resource filter so you can short-circuit the request.
Each filter type is executed at a different stage in the filter pipeline:
Authorization Filters
- Authorization filters run first and are used to determine whether the user is authorized for the request.
- Authorization filters short-circuit the pipeline if the request is not authorized.
Resource filters
- Run after authorization.
- OnResourceExecuting runs code before the rest of the filter pipeline. For example, OnResourceExecuting runs code before model binding.
- OnResourceExecuted runs code after the rest of the pipeline has completed.
The example below is taken from the documentation, it's an implementation of a Resource Filter. Presumably, a similar implementation is possible with an Authorization Filter but I believe returning a valid Http Status Code after failing an Authorization Filter may be a bit of an anti-pattern.
QUESTION
I am trying to get allowed and disallowed parts of a user agent in robots.txt file of netflix website using following code:-
...ANSWER
Answered 2020-Mar-22 at 14:46The following script will read the robots.txt file from top to bottom splitting on newline. Most likely you won't be reading robots.txt from a string, but something more like an iterator.
When the User-agent label is found, start creating a list of user agents. Multiple user agents share a set of Disallowed/Allowed permissions.
When an Allowed or Disallowed label is identified, emit that permission for each user-agent associated with the permission block.
Emitting the data in this manner will allow you to sort or aggregate the data for whichever use case you need.
- Group by User-agent
- Group by permission: Allowed / Disallowed
- build a dictionary of paths and associated permission or user-agent
QUESTION
I'm running .Net Core middleware and an AngularJS front-end. On my main page, I have google analytics script tags, and other script tags necessary for verifying with third-party providers. Prerender.io removes these by default, however, there's a plugin "removeScriptTags". Does anyone have experience turning this off with the .Net Core Middleware?
A better solution may be to blacklist the crawlers you don't want seeing cached content, though I'm not sure this is configurable. In my case, it looks like all the user-agents below are accessing Prerender.io cached content.
Here is my "crawlerUserAgentPattern" which are the crawlers that should be allowed to access the cached content. I don't see the ones above on this list so I'm confused as to why they're allowed to access.
"(SeobilityBot)|(Seobility)|(seobility)|(bingbot)|(googlebot)|(google)|(bing)|(Slurp)|(DuckDuckBot)|(YandexBot)|(baiduspider)|(Sogou)|(Exabot)|(ia_archiver)|(facebot)|(facebook)|(twitterbot)|(rogerbot)|(linkedinbot)|(embedly)|(quora)|(pinterest)|(slackbot)|(redditbot)|(Applebot)|(WhatsApp)|(flipboard)|(tumblr)|(bitlybot)|(Discordbot)"
...ANSWER
Answered 2019-Dec-26 at 16:21It looks like you have (google)
in your regex. You already have googlebot
in there so I'd suggest you remove (google)
if you don't want to match any user agent that just contains the word "google".
QUESTION
How can I allow people to test my faceBot (facebook messenger bot)
Until there .. I should in every time adding tester and/or developer to test my chatBot and I should say it doesn't make sense.
...ANSWER
Answered 2017-Mar-19 at 22:28When you're ready to make your bot live to the public, you need to submit it for approval by adding the messenger platform and submitting all the required items for approval to go public. All the information you need on that can be found here Messenger Bot Review
QUESTION
I'm having problems with a big Laravel project and the Redis storage. We store our sessions in Redis. We already have 28GB of RAM there. However, it still runs relatively fast to the limit, because we have very many hits (more than 250,000 per day) from search engine bots.
Is there any elegant way to completely disable sessions for bots? I have already implemented my own session middleware, which looks like this:
...ANSWER
Answered 2018-Jul-04 at 16:33Your problem could be that you're not identifying robots correctly, so it would be helpful to provide the code for that.
Specific to writing middleware that disables sessions, you're much better off changing the session driver to the array
driver as that driver does not persist sessions, instead of changing the configuration of the real session driver at runtime.
QUESTION
I've created a simple app that server renders some basic SPA content based on the user agent.
For example, if an AngularJS website link is shared on Facebook i have a Apache rewrite rule to redirect that link to the rendering app. The rendering app then checks the URL that was passed as a query parameter and returns the specified rendered content.
Everything works as expected, but there's a problem with the rendered result. The canonical link showed in the Facebook post is the rendering app's link. Here's what's happening:
Shared Link: www.example.com/the-shared-link
Facebook's post result:
Instead of displaying the shared link (www.example.com/the-shared-link) the rendering app is shown instead (rendering.app.com). But if i click on the Facebook post, it opens the correct website page.
Facebook Debugger result:
All the needed meta tags are added to the rendered result page:
...ANSWER
Answered 2018-May-28 at 11:51Solved my issue!
The rendering.app.com domain had a rewrite rule to force https. This causes a 301 HTTP Redirect (just as the Facebook Debugger showed). Using https://rendeting.app.com solved my issue. Another way of solving the 301 HTTP Redirect would be removing the https rewrite rule in the target domain.
QUESTION
Hi I've recently been getting super high spikes in Apache CPU usage, Apache memory usage and MySQL memory usage. It turns out that crawlers were accessing my site at a very aggressive rate, specifically Facebook. I attempted to add a crawl delay for the facebook crawler to the robot.txt file, as seen below:
...ANSWER
Answered 2018-May-03 at 08:50Change the path to
QUESTION
I'd like to use some permalink slug that allow the users to share the link (url.com/artist/songtitle) with it's Facebook pictures, url, description, and so on (Which is redirect the users to url.com/#/artist/songtitle). So i decided to showing the OG meta to Facebook user-agent and separate it from the redirector.
But, the problem come when i use the Facebook Debug Tools and try to fetch it. The crawlers wasn't caught by my user-agent separator.
Im using, this code to detect Facebook crawlers. Any idea to fix this problem?
...ANSWER
Answered 2018-Apr-18 at 16:04You may want to use stristr
or a regex
instead of strpos
. As it is right now, your code won't match FacebookExternalHit
, because it contains Capital Letters and strpos
function is CaseSenSiTive
.
Facebook User-Agents are:
QUESTION
I am loading data that looks like this (URL removed)
...ANSWER
Answered 2018-Apr-12 at 23:38Use ENCLOSED BY and ESCAPED BY
QUESTION
I've an condition for my .htaccess for crawlers and search engines which takes them to a "static" page where they can scrape all content.
Up until now I've had my domain {client}.realdomain.com where {client} is a subdomain for one client.
When the client then shares something on a social network, e.g. facebook/linkedin their crawlers are taken to my .htaccess which have following conditions (and this works)
URL example: http://{client}.realdomain.com/s/token
ANSWER
Answered 2018-Mar-06 at 13:27RewriteCond %{HTTP_USER_AGENT} (LinkedInBot/[0-9]|facebookexternalhit/[0-9]|Facebot|Twitterbot|twitterbot|Pinterest|pinterest|Google.*snippet|baiduspider|rogerbot|embedly|quora\ link\ preview|showyoubot|outbrain|slackbot|vkShare|W3C_Validator)
RewriteCond %{HTTP_HOST} ^(.+?)\.%{HTTP_HOST}%\.com$
RewriteRule ^s/(.*)$ http://%1.%{HTTP_HOST}%.com/static.php?token=$1 [NC,L]
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install facebot
You can use facebot like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page