facebot | A facebook profile and reconnaissance system | Crawler library

by pun1sh3r Python Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | facebot Summary

facebot is a Python library typically used in Automation, Crawler, Drupal applications. facebot has no vulnerabilities, it has build file available and it has low support. However facebot has 2 bugs. You can download it from GitHub.

A facebook automated profile and reconnaissance system.

Support

Quality

Security

License

Reuse

Support

facebot has a low active ecosystem.

It has 55 star(s) with 27 fork(s). There are 9 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. On average issues are closed in 17 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of facebot is current.

Quality

facebot has 2 bugs (1 blocker, 0 critical, 1 major, 0 minor) and 121 code smells.

Security

facebot has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

facebot code analysis shows 0 unresolved vulnerabilities.

There are 9 security hotspots that need review.

License

facebot does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

facebot releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

facebot saves you 282 person hours of effort in developing the same functionality from scratch.

It has 681 lines of code, 24 functions and 2 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed facebot and discovered the below as its top functions. This is intended to give you an instant insight into facebot implemented functionality, and help decide if they suit your requirements.

get friends list
Parse argparse .
This function is used to extract data from the cargo
initiate server
add fbids to db
send a random fbids
handle start tag
Generate a random link .
returns the content of the comic
Execute a SQL statement

Get all kandi verified functions for this library.

facebot Key Features

No Key Features are available at this moment for facebot.

facebot Examples and Code Snippets

No Code Snippets are available at this moment for facebot.

Community Discussions

Trending Discussions on facebot

Filter on Controller to check User Agent and then redirect based on if result is true

Parse allowed and disallowed parts of robots.txt file

Allow script tags in .Net Core Prerender.io middlewear

How to allow Anyone to test my faceBot (facebook messenger bot)

No Laravel Sessions for Bots

Facebook Debugger: Change Canonical URL value after Reverse Proxy Rewrite

How can I stop facebook crawler causing high CPU usage

Why i can't get rid the Facebook user-agent?

load data WITHOUT double-quotes

.htaccess: rewriteCond to another page within same URL with params

QUESTION

Filter on Controller to check User Agent and then redirect based on if result is true

Asked 2020-Oct-15 at 18:08

--------- Note (Edit) - I might be doing this completely wrong, any guidance would be appreciated if this is in fact wrong (New to mvc)

In the solution a robots.txt file exists to block all crawlers from the site. The only problem with this is, is Facebooks crawler/scraper is not following the rules and are still crawling/scraping the site and causing an error to log and email every couple of minutes. The error being sent for this is "A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'."

The solution for this is to create a filter on the Controllers to check the agent name. If the agent name is for facebook then redirect them to a "No Robots authentication page". The filter has to be on the controller due to the site catering for 3 different routes where each has a custom link and customers has access to the direct links which gets shared on facebook (thus creating a route for this in the route config will not work).

The problem I'm facing is that the solution is not redirecting immediate on the controller filter. It's acceding Action methods (These action methods are Partial Pages) and then fails due to not being able to redirect (the view already started rendering then - which is correct). Is there a way to redirect immediately on the first time when this filter is accessed? Or is there maybe a better solution to this?

To test and troubleshoot I am changing the user agent in code to match what is logged. The error when redirecting from the filter: "Child actions are not allowed to perform redirect actions."

The Error that is currently logged due to Facebook's crawler: " A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'. "

User Agent from Stack Trace:

This is what I've done:

Custom Filter:

...

ANSWER

Answered 2020-Oct-15 at 18:08

This is because you're using an ActionFilterAttribute. If you check the documentation here: https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-3.1 it explains the filter lifecycle and basically - by the time you arrive to action filters, it's too late. You need an authorization filter or a resource filter so you can short-circuit the request.

Each filter type is executed at a different stage in the filter pipeline:

Authorization Filters

Authorization filters run first and are used to determine whether the user is authorized for the request.

Authorization filters short-circuit the pipeline if the request is not authorized.

Resource filters

Run after authorization.

OnResourceExecuting runs code before the rest of the filter pipeline. For example, OnResourceExecuting runs code before model binding.

OnResourceExecuted runs code after the rest of the pipeline has completed.

The example below is taken from the documentation, it's an implementation of a Resource Filter. Presumably, a similar implementation is possible with an Authorization Filter but I believe returning a valid Http Status Code after failing an Authorization Filter may be a bit of an anti-pattern.

Source https://stackoverflow.com/questions/64266891

QUESTION

Parse allowed and disallowed parts of robots.txt file

Asked 2020-Mar-22 at 15:57

I am trying to get allowed and disallowed parts of a user agent in robots.txt file of netflix website using following code:-

...

ANSWER

Answered 2020-Mar-22 at 14:46

Overview

The following script will read the robots.txt file from top to bottom splitting on newline. Most likely you won't be reading robots.txt from a string, but something more like an iterator.

When the User-agent label is found, start creating a list of user agents. Multiple user agents share a set of Disallowed/Allowed permissions.

When an Allowed or Disallowed label is identified, emit that permission for each user-agent associated with the permission block.

Emitting the data in this manner will allow you to sort or aggregate the data for whichever use case you need.

Group by User-agent
Group by permission: Allowed / Disallowed
build a dictionary of paths and associated permission or user-agent

Source https://stackoverflow.com/questions/60800033

QUESTION

Allow script tags in .Net Core Prerender.io middlewear

Asked 2019-Dec-26 at 16:21

I'm running .Net Core middleware and an AngularJS front-end. On my main page, I have google analytics script tags, and other script tags necessary for verifying with third-party providers. Prerender.io removes these by default, however, there's a plugin "removeScriptTags". Does anyone have experience turning this off with the .Net Core Middleware?

A better solution may be to blacklist the crawlers you don't want seeing cached content, though I'm not sure this is configurable. In my case, it looks like all the user-agents below are accessing Prerender.io cached content.

Here is my "crawlerUserAgentPattern" which are the crawlers that should be allowed to access the cached content. I don't see the ones above on this list so I'm confused as to why they're allowed to access.

...

ANSWER

Answered 2019-Dec-26 at 16:21

It looks like you have (google) in your regex. You already have googlebot in there so I'd suggest you remove (google) if you don't want to match any user agent that just contains the word "google".

Source https://stackoverflow.com/questions/59464236

QUESTION

How to allow Anyone to test my faceBot (facebook messenger bot)

Asked 2019-Mar-07 at 02:43

How can I allow people to test my faceBot (facebook messenger bot)

Until there .. I should in every time adding tester and/or developer to test my chatBot and I should say it doesn't make sense.

...

ANSWER

Answered 2017-Mar-19 at 22:28

When you're ready to make your bot live to the public, you need to submit it for approval by adding the messenger platform and submitting all the required items for approval to go public. All the information you need on that can be found here Messenger Bot Review

Source https://stackoverflow.com/questions/42887750

QUESTION

No Laravel Sessions for Bots

Asked 2018-Oct-30 at 14:33

I'm having problems with a big Laravel project and the Redis storage. We store our sessions in Redis. We already have 28GB of RAM there. However, it still runs relatively fast to the limit, because we have very many hits (more than 250,000 per day) from search engine bots.

Is there any elegant way to completely disable sessions for bots? I have already implemented my own session middleware, which looks like this:

...

ANSWER

Answered 2018-Jul-04 at 16:33

Your problem could be that you're not identifying robots correctly, so it would be helpful to provide the code for that.

Specific to writing middleware that disables sessions, you're much better off changing the session driver to the array driver as that driver does not persist sessions, instead of changing the configuration of the real session driver at runtime.

Source https://stackoverflow.com/questions/51176946

QUESTION

Facebook Debugger: Change Canonical URL value after Reverse Proxy Rewrite

Asked 2018-May-28 at 11:51

I've created a simple app that server renders some basic SPA content based on the user agent.

For example, if an AngularJS website link is shared on Facebook i have a Apache rewrite rule to redirect that link to the rendering app. The rendering app then checks the URL that was passed as a query parameter and returns the specified rendered content.

Everything works as expected, but there's a problem with the rendered result. The canonical link showed in the Facebook post is the rendering app's link. Here's what's happening:

Shared Link: www.example.com/the-shared-link

Facebook's post result:

Instead of displaying the shared link (www.example.com/the-shared-link) the rendering app is shown instead (rendering.app.com). But if i click on the Facebook post, it opens the correct website page.

Facebook Debugger result:

All the needed meta tags are added to the rendered result page:

...

ANSWER

Answered 2018-May-28 at 11:51

Solved my issue!

The rendering.app.com domain had a rewrite rule to force https. This causes a 301 HTTP Redirect (just as the Facebook Debugger showed). Using https://rendeting.app.com solved my issue. Another way of solving the 301 HTTP Redirect would be removing the https rewrite rule in the target domain.

Source https://stackoverflow.com/questions/50412886

QUESTION

How can I stop facebook crawler causing high CPU usage

Asked 2018-May-03 at 08:50

Hi I've recently been getting super high spikes in Apache CPU usage, Apache memory usage and MySQL memory usage. It turns out that crawlers were accessing my site at a very aggressive rate, specifically Facebook. I attempted to add a crawl delay for the facebook crawler to the robot.txt file, as seen below:

...

ANSWER

Answered 2018-May-03 at 08:50

Change the path to

Source https://stackoverflow.com/questions/50134863

QUESTION

Why i can't get rid the Facebook user-agent?

Asked 2018-Apr-18 at 23:53

I'd like to use some permalink slug that allow the users to share the link (url.com/artist/songtitle) with it's Facebook pictures, url, description, and so on (Which is redirect the users to url.com/#/artist/songtitle). So i decided to showing the OG meta to Facebook user-agent and separate it from the redirector.

But, the problem come when i use the Facebook Debug Tools and try to fetch it. The crawlers wasn't caught by my user-agent separator.

Im using, this code to detect Facebook crawlers. Any idea to fix this problem?

...

ANSWER

Answered 2018-Apr-18 at 16:04

You may want to use stristr or a regex instead of strpos. As it is right now, your code won't match FacebookExternalHit, because it contains Capital Letters and strpos function is CaseSenSiTive.

Facebook User-Agents are:

Source https://stackoverflow.com/questions/49903788

QUESTION

load data WITHOUT double-quotes

Asked 2018-Apr-12 at 23:56

I am loading data that looks like this (URL removed)

...

ANSWER

Answered 2018-Apr-12 at 23:38

Use ENCLOSED BY and ESCAPED BY

Source https://stackoverflow.com/questions/49807272

QUESTION

.htaccess: rewriteCond to another page within same URL with params

Asked 2018-Mar-06 at 13:27

I've an condition for my .htaccess for crawlers and search engines which takes them to a "static" page where they can scrape all content.

Up until now I've had my domain {client}.realdomain.com where {client} is a subdomain for one client.

When the client then shares something on a social network, e.g. facebook/linkedin their crawlers are taken to my .htaccess which have following conditions (and this works)

URL example: http://{client}.realdomain.com/s/token

...

ANSWER

Answered 2018-Mar-06 at 13:27

RewriteCond %{HTTP_USER_AGENT} (LinkedInBot/[0-9]|facebookexternalhit/[0-9]|Facebot|Twitterbot|twitterbot|Pinterest|pinterest|Google.*snippet|baiduspider|rogerbot|embedly|quora\ link\ preview|showyoubot|outbrain|slackbot|vkShare|W3C_Validator)
RewriteCond %{HTTP_HOST} ^(.+?)\.%{HTTP_HOST}%\.com$
RewriteRule ^s/(.*)$ http://%1.%{HTTP_HOST}%.com/static.php?token=$1 [NC,L]

Source https://stackoverflow.com/questions/49130923

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install facebot

You can download it from GitHub.
You can use facebot like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: