twitterbot | A Python framework for creating interactive Twitter bots | Bot library
kandi X-RAY | twitterbot Summary
Support
Quality
Security
License
Reuse
- Main thread loop
- Logs a message
- Log a Tweepy error
- Check for new followers
- Post a Tweet
- Post a mention
- Post a sentence
- Open a file
twitterbot Key Features
twitterbot Examples and Code Snippets
Trending Discussions on twitterbot
Trending Discussions on twitterbot
QUESTION
Trying to have fun with a twitter bot.
The idea is : according to the art institute of chicago API, posting a Tweet with the informations (Artist, Date, Place...) And the media (picture).
I can't upload a media here, bellow you can see the traceback that I am trying to fix.
I will appreciate ! B
import tweepy
import requests
import random
import time
import io
############################# My logs ######################################
def twitter_api():
consumer_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
consumer_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
return api
############################# My fonctions #################################
############################# The Loop ######################################
while True:
get_number()
r = requests.get(f"https://api.artic.edu/api/v1/artworks/{get_number()}")
a = r.json()
get_Titre(),get_Artist(),get_Date(),get_Place(),get_Im()
requests2 = (f"https://www.artic.edu/iiif/2/{get_Im()}/full/843,/0/default.jpg")
print("Imge ok:....................", requests2)
print(type(requests2))
message = (get_Titre()+ get_Artist()+str(get_Date())+get_Place())
print("La tête du tweet sera:", message)
twitter_api().update_status_with_media(message,requests2)
time.sleep(14400)
Here is the Traceback :
Traceback (most recent call last):
File "C:\PycharmProjects\TwitterBot\main.py", line 76, in
twitter_api().update_status_with_media(message,requests2)
File "C:\PycharmProjects\TwitterBot\venv\lib\site-packages\tweepy\api.py", line 46, in wrapper
return method(*args, **kwargs)
File "C:\PycharmProjects\TwitterBot\venv\lib\site-packages\tweepy\api.py", line 1181, in update_status_with_media
files = {'media[]': stack.enter_context(open(filename, 'rb'))}
OSError: [Errno 22] Invalid argument: 'https://www.artic.edu/iiif/2/904ea189-c852-5f84-c614-a26a851f9b74/full/843,/0/default.jpg'
ANSWER
Answered 2022-Apr-02 at 04:23See documentation for update_status_with_media - second argument has to be filename.
update_status_with_media(text, filename, file, ...)
But third argument can be file-like object
and this means object which has function .read()
.
If you would use urllib.request
then it gives object which has .read()
and it works.
BTW: you have to use any text as second argument - can be fake filename but function needs it.
import os
import urllib.request
import tweepy
url = "https://www.iheartradio.ca/image/policy:1.15731844:1627581512/rick.jpg?f=default&$p$f=20c1bb3"
text = "Testing module tweepy"
# --- create file_like_object ---
file_like_object = urllib.request.urlopen(url)
# --- send tweet ---
consumer_key = os.getenv('TWITTER_CONSUMER_KEY')
consumer_secret = os.getenv('TWITTER_CONSUMER_SECRET')
access_token = os.getenv('TWITTER_ACCESS_TOKEN')
access_token_secret = os.getenv('TWITTER_ACCESS_TOKEN_SECRET')
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
twitter_api = tweepy.API(auth)
# use any filename as second argument, and file-like object as third argument
twitter_api.update_status_with_media(text, 'fake_name.jpg', file=file_like_object)
With requests
you have to use io.BytesIO
to create file-like object
import os
import io
import requests
import tweepy
url = "https://www.iheartradio.ca/image/policy:1.15731844:1627581512/rick.jpg?f=default&$p$f=20c1bb3"
text = "Testing module tweepy"
# --- create file_like_object ---
response = requests.get(url)
file_like_object = io.BytesIO(response.content)
# --- send tweet ---
consumer_key = os.getenv('TWITTER_CONSUMER_KEY')
consumer_secret = os.getenv('TWITTER_CONSUMER_SECRET')
access_token = os.getenv('TWITTER_ACCESS_TOKEN')
access_token_secret = os.getenv('TWITTER_ACCESS_TOKEN_SECRET')
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
twitter_api = tweepy.API(auth)
# use any filename as second argument, and file-like object as third argument
twitter_api.update_status_with_media(text, 'fake_name.jpg', file=file_like_object)
EDIT:
Eventually you can use stream=True
and then response.raw
gives file-like object
but this is not so popular.
# --- create file_like_object ---
response = requests.get(url, stream=True)
file_like_object = response.raw
QUESTION
I want to redirect url to my ogp page when User-agent matches Twitter or Facebook.
My redirect image is like this.
/news/detail.html?id=1 -> /api/v1/informations/ogp/1?lang=ja
/news/detail.html?id=1&lang=en -> /api/v1/informations/ogp/1?lang=en
/sport/detail.html?id=1 -> /api/v1/sports/ogp/1?lang=ja
/sport/detail.html?id=1&lang=en -> /api/v1/sports/ogp/1?lang=en
/event/common/detail.html?id=1 -> /api/v1/events/ogp/1?lang=ja
/event/common/detail.html?id=1 -> /api/v1/events/ogp/1?lang=ja
/event/special/detail.html?id=2&lang=en -> /api/v1/events/ogp/2?lang=en
/event/special/detail.html?id=2 -> /api/v1/events/ogp/2?lang=ja
So I wrote htaccess, Env params work ok, but rewrite rule does not work when RewriteRule takes two or more over.
SetEnvIfNoCase User-Agent "^facebookexternalhit.*$" UA_FACEBOOK=1
SetEnvIfNoCase User-Agent "^facebookplatform.*$" UA_FACEBOOK=1
SetEnvIfNoCase User-Agent "^Twitterbot.*$" UA_TWITTER=1
RewriteEngine on
RewriteBase /
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)&lang=(\w+)($|&)
RewriteRule ^news/detail.html$ /api/v1/informations/ogp/%2?lang=%3 [R,L]
RewriteRule ^sport/detail.html$ /api/v1/sports/ogp/%2?lang=%3 [R,L]
RewriteRule ^event/common/detail.html$ /api/v1/events/ogp/%2?lang=%3 [R,L]
RewriteRule ^event/special/detail.html$ /api/v1/events/ogp/%2?lang=%3 [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)($|&)
RewriteRule ^news/detail.html$ /api/v1/informations/ogp/%2?lang=ja [R,L]
RewriteRule ^sport/detail.html$ /api/v1/sports/ogp/%2?lang=ja [R,L]
RewriteRule ^event/common/detail.html$ /api/v1/events/ogp/%2?lang=ja [R,L]
RewriteRule ^event/special/detail.html$ /api/v1/events/ogp/%2?lang=ja [R,L]
I want to redirect them all with same RewriteCond, how to do that?
Even dirty code is welcome!
ANSWER
Answered 2022-Mar-01 at 08:18This dirty code did what I wanted it to do.
SetEnvIfNoCase User-Agent "^facebookexternalhit.*$" UA_FACEBOOK=1
SetEnvIfNoCase User-Agent "^facebookplatform.*$" UA_FACEBOOK=1
SetEnvIfNoCase User-Agent "^Twitterbot.*$" UA_TWITTER=1
RewriteEngine on
RewriteBase /
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)&lang=(\w+)($|&)
RewriteRule ^news/detail.html$ /api/v1/informations/ogp/%2?lang=%3 [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)&lang=(\w+)($|&)
RewriteRule ^sport/detail.html$ /api/v1/sports/ogp/%2?lang=%3 [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)&lang=(\w+)($|&)
RewriteRule ^event/common/detail.html$ /api/v1/events/ogp/%2?lang=%3 [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)&lang=(\w+)($|&)
RewriteRule ^event/special/detail.html$ /api/v1/events/ogp/%2?lang=%3 [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)($|&)
RewriteRule ^news/detail.html$ /api/v1/informations/ogp/%2?lang=ja [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)($|&)
RewriteRule ^sport/detail.html$ /api/v1/sports/ogp/%2?lang=ja [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)($|&)
RewriteRule ^event/common/detail.html$ /api/v1/events/ogp/%2?lang=ja [R,L]
RewriteCond %{ENV:UA_FACEBOOK} ^1$ [OR]
RewriteCond %{ENV:UA_TWITTER} ^1$
RewriteCond %{QUERY_STRING} (^|&)id=(\d+)($|&)
RewriteRule ^event/special/detail.html$ /api/v1/events/ogp/%2?lang=ja [R,L]
I tried the method of using the If directive, but it didn't work with my environment variable like below code
so I solved this problem by writing a lot of same RewriteCond again and again.
If you have a better way to write this, I welcome it.
QUESTION
All of my angularjs site works with prerender except for the home page. When crawled, it sends back a 404 page. I have reason to believe it is this line of code in my .htaccess file, RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L]
but I am not sure.
RewriteEngine On
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "notprovidingthiscode"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L]
# If non existent
# Accept everything on index.html
RewriteRule ^ /index.html
ANSWER
Answered 2022-Feb-23 at 14:31The issue turned out to be that the .htaccess file was serving example.com/index.html rather than just example.com when accessing the root of the angularjs app. That in turn didn't play well with ui-router because the $stateProvider doesn't serve filenames at the end of urls without being explicit. Accessing example.com/index.html did indeed cause my page to throw a 404 error $urlRouterProvider.otherwise('404');
Adding the following code fixed my issue.
$urlRouterProvider.when('/index.html', '/');
This redirects example.com/index.html to example.com which points to the correct rendering in prerender.io.
QUESTION
I have a wordpress+nginx in a docker container that is working perfectly through the browser, but when I try to send an http request via curl without headers the response is always empty
❯ curl -vv localhost:8080
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
* Closing connection 0
It does work if I add any User-agent header with the -H option, but I would like it to work even when there's no user-agent in the headers.
Here are my nginx settings:
- nginx.conf
worker_processes 1;
daemon off;
events {
worker_connections 1024;
}
http {
root /var/www/html;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /dev/stdout main;
error_log /dev/stderr error;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
#keepalive low (5seconds), should force hackers to re-connect.
keepalive_timeout 5;
fastcgi_intercept_errors on;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
default_type application/octet-stream;
#php max upload limit cannot be larger than this
client_max_body_size 40m;
gzip on;
gzip_disable "msie6";
gzip_min_length 256;
gzip_comp_level 4;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript image/svg+xml;
limit_req_zone $remote_addr zone=loginauth:10m rate=15r/s;
include /etc/nginx/mime.types;
include /etc/nginx/nginx-server.conf;
}
- nginx-server.conf
server {
listen 8080 default_server;
server_name "localhost";
access_log /dev/stdout main;
error_log /dev/stdout error;
# pass the PHP scripts to FastCGI
location ~ \.php$ {
include fastcgi_params;
fastcgi_pass unix:/home/www-data/php-fpm.sock;
fastcgi_index index.php;
fastcgi_param DOCUMENT_ROOT $realpath_root;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_intercept_errors on;
}
#Deny access to .htaccess, .htpasswd...
location ~ /\.ht {
deny all;
}
location ~* .(jpg|jpeg|png|gif|ico|css|js|pdf|doc|docx|odt|rtf|ppt|pptx|xls|xlsx|txt)$ {
expires max;
}
location = /favicon.ico {
log_not_found off;
access_log off;
}
location = /robots.txt {
allow all;
log_not_found off;
access_log off;
}
#Block bad-bots
if ($http_user_agent ~* (360Spider|80legs.com|Abonti|AcoonBot|Acunetix|adbeat_bot|AddThis.com|adidxbot|ADmantX|AhrefsBot|AngloINFO|Antelope|Applebot|BaiduSpider|BeetleBot|billigerbot|binlar|bitlybot|BlackWidow|BLP_bbot|BoardReader|Bolt\ 0|BOT\ for\ JCE|Bot\ mailto\:craftbot@yahoo\.com|casper|CazoodleBot|CCBot|checkprivacy|ChinaClaw|chromeframe|Clerkbot|Cliqzbot|clshttp|CommonCrawler|comodo|CPython|crawler4j|Crawlera|CRAZYWEBCRAWLER|Curious|Curl|Custo|CWS_proxy|Default\ Browser\ 0|diavol|DigExt|Digincore|DIIbot|discobot|DISCo|DoCoMo|DotBot|Download\ Demon|DTS.Agent|EasouSpider|eCatch|ecxi|EirGrabber|Elmer|EmailCollector|EmailSiphon|EmailWolf|Exabot|ExaleadCloudView|ExpertSearchSpider|ExpertSearch|Express\ WebPictures|ExtractorPro|extract|EyeNetIE|Ezooms|F2S|FastSeek|feedfinder|FeedlyBot|FHscan|finbot|Flamingo_SearchEngine|FlappyBot|FlashGet|flicky|Flipboard|g00g1e|Genieo|genieo|GetRight|GetWeb\!|GigablastOpenSource|GozaikBot|Go\!Zilla|Go\-Ahead\-Got\-It|GrabNet|grab|Grafula|GrapeshotCrawler|GTB5|GT\:\:WWW|Guzzle|harvest|heritrix|HMView|HomePageBot|HTTP\:\:Lite|HTTrack|HubSpot|ia_archiver|icarus6|IDBot|id\-search|IlseBot|Image\ Stripper|Image\ Sucker|Indigonet|Indy\ Library|integromedb|InterGET|InternetSeer\.com|Internet\ Ninja|IRLbot|ISC\ Systems\ iRc\ Search\ 2\.1|jakarta|Java|JetCar|JobdiggerSpider|JOC\ Web\ Spider|Jooblebot|kanagawa|KINGSpider|kmccrew|larbin|LeechFTP|libwww|Lingewoud|LinkChecker|linkdexbot|LinksCrawler|LinksManager\.com_bot|linkwalker|LinqiaRSSBot|LivelapBot|ltx71|LubbersBot|lwp\-trivial|Mail.RU_Bot|masscan|Mass\ Downloader|maverick|Maxthon$|Mediatoolkitbot|MegaIndex|MegaIndex|megaindex|MFC_Tear_Sample|Microsoft\ URL\ Control|microsoft\.url|MIDown\ tool|miner|Missigua\ Locator|Mister\ PiX|mj12bot|Mozilla.*Indy|Mozilla.*NEWT|MSFrontPage|msnbot|Navroad|NearSite|NetAnts|netEstate|NetSpider|NetZIP|Net\ Vampire|NextGenSearchBot|nutch|Octopus|Offline\ Explorer|Offline\ Navigator|OpenindexSpider|OpenWebSpider|OrangeBot|Owlin|PageGrabber|PagesInventory|panopta|panscient\.com|Papa\ Foto|pavuk|pcBrowser|PECL\:\:HTTP|PeoplePal|Photon|PHPCrawl|planetwork|PleaseCrawl|PNAMAIN.EXE|PodcastPartyBot|prijsbest|proximic|psbot|purebot|pycurl|QuerySeekerSpider|R6_CommentReader|R6_FeedFetcher|RealDownload|ReGet|Riddler|Rippers\ 0|rogerbot|RSSingBot|rv\:1.9.1|RyzeCrawler|SafeSearch|SBIder|Scrapy|Scrapy|Screaming|SeaMonkey$|search.goo.ne.jp|SearchmetricsBot|search_robot|SemrushBot|Semrush|SentiBot|SEOkicks|SeznamBot|ShowyouBot|SightupBot|SISTRIX|sitecheck\.internetseer\.com|siteexplorer.info|SiteSnagger|skygrid|Slackbot|Slurp|SmartDownload|Snoopy|Sogou|Sosospider|spaumbot|Steeler|sucker|SuperBot|Superfeedr|SuperHTTP|SurdotlyBot|Surfbot|tAkeOut|Teleport\ Pro|TinEye-bot|TinEye|Toata\ dragostea\ mea\ pentru\ diavola|Toplistbot|trendictionbot|TurnitinBot|turnit|Twitterbot|URI\:\:Fetch|urllib|Vagabondo|Vagabondo|vikspider|VoidEYE|VoilaBot|WBSearchBot|webalta|WebAuto|WebBandit|WebCollage|WebCopier|WebFetch|WebGo\ IS|WebLeacher|WebReaper|WebSauger|Website\ eXtractor|Website\ Quester|WebStripper|WebWhacker|WebZIP|Web\ Image\ Collector|Web\ Sucker|Wells\ Search\ II|WEP\ Search|WeSEE|Wget|Widow|WinInet|woobot|woopingbot|worldwebheritage.org|Wotbox|WPScan|WWWOFFLE|WWW\-Mechanize|Xaldon\ WebSpider|XoviBot|yacybot|Yahoo|YandexBot|Yandex|YisouSpider|zermelo|Zeus|zh-CN|ZmEu|ZumBot|ZyBorg) ) {
return 444;
}
include /etc/nginx/nginx-locations.conf;
include /var/www/nginx/locations/*;
}
- nginx-locations.conf
# Deny all attempts to access hidden files such as .htaccess, .htpasswd, .DS_Store (Mac).
# Keep logging the requests to parse later (or to pass to firewall utilities such as fail2ban)
location ~ /\. {
deny all;
}
# Deny access to any files with a .php extension in the uploads directory for the single site
location ~ ^/wp-content/uploads/.*\.php$ {
deny all;
}
#Deny access to wp-content folders for suspicious files
location ~* ^/(wp-content)/(.*?)\.(zip|gz|tar|bzip2|7z)\$ { deny all; }
location ~ ^/wp-content/uploads/sucuri { deny all; }
location ~ ^/wp-content/updraft { deny all; }
location ~* ^/wp-content/uploads/.*.(html|htm|shtml|php|js|swf)$ {
deny all;
}
# Block PHP files in includes directory.
location ~* /wp-includes/.*\.php\$ {
deny all;
}
# Deny access to any files with a .php extension in the uploads directory
# Works in sub-directory installs and also in multisite network
# Keep logging the requests to parse later (or to pass to firewall utilities such as fail2ban)
location ~* /(?:uploads|files|wp-content|wp-includes)/.*\.php$ {
deny all;
}
# Block nginx-help log from public viewing
location ~* /wp-content/uploads/nginx-helper/ { deny all; }
# Deny access to any files with a .php extension in the uploads directory
# Works in sub-directory installs and also in multisite network
location ~* /(?:uploads|files)/.*\.php\$ { deny all; }
# Deny access to uploads that aren’t images, videos, music, etc.
location ~* ^/wp-content/uploads/.*.(html|htm|shtml|php|js|swf|css)$ {
deny all;
}
location / {
# This is cool because no php is touched for static content.
# include the "?$args" part so non-default permalinks doesn't break when using query string
index index.php index.html;
try_files $uri $uri/ /index.php?$args;
}
# More ideas from:
# https://gist.github.com/ethanpil/1bfd01a817a8198369efec5c4cde6628
location ~* /(\.|wp-config\.php|wp-config\.txt|changelog\.txt|readme\.txt|readme\.html|license\.txt) { deny all; }
# Make sure files with the following extensions do not get loaded by nginx because nginx would display the source code, and these files can contain PASSWORDS!
location ~* \.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)\$|^(\..*|Entries.*|Repository|Root|Tag|Template)\$|\.php_
{
return 444;
}
#nocgi
location ~* \.(pl|cgi|py|sh|lua)\$ {
return 444;
}
#disallow
location ~* (w00tw00t) {
return 444;
}
My aim is to get the server to respond to any request, even if it has no User-agent header.
Thanks for your time!
ANSWER
Answered 2021-Nov-17 at 16:04This has nothing to do with docker or wordpress or something else.
It is your nginx-configuration solely that rejecting the request:
You have Curl
in your http-agent comparison in nginx-server.conf
:
#Block bad-bots
if ($http_user_agent ~* (...|Curl|...) ) {
return 444;
}
and because ~*
is a case-insensitive matching operator, every request from curl would return with 444 here.
Example how you can check it using grep:
$ echo 'curl/7.64.1' | grep -iPo '(...some...|Curl|...other...)'
curl
And code 444
is a special nginx’s non-standard code, which when returned would force nginx to close the connection immediately without to send anything to the client. This is comparable to connection reject (closed by peer).
FWIW: (for people searching why the request is not processed as expected) - nginx has a possibility to set debug
for logging (e. g. can be set for certain port-listener in order to debug it), so an error-log would contain detailed information how the request is processed (which locations and rewrite rules are triggered and what happens with the request on every event stage and what the response and error code will be supplied to client at end).
QUESTION
I am trying to create a Lambda@Edge function to return Open Graph HTML for my Angular SPA application. I've installed it into the CloudFrond "Viewer Request" lifecycle. This lambda checks the user agent, and if it's the Facebook or Twitter crawler, it returns HTML (currently hard coded in the lambda for testing). If the request is from any other user-agent, the request is passed through to the origin. The pass-through logic is working properly, but if I try to intercept and return the Open Graph HTML for the crawlers, I get an error.
In CloudWatch, the error reported by CloudFront is:
ERROR Validation error: The Lambda function returned an invalid body, body should be of object type.
In Postman (by faking the user-agent), I get a 502 with:
The Lambda function result failed validation: The body is not a string, is not an object, or exceeds the maximum size.
I'm pulling my hair out with this one. Any ideas? Here's my lambda.
'use strict';
function buildReleaseResponse( request ) {
const content = `<\!DOCTYPE html>
Hello, World
Open Graph Test
`;
return {
statusCode: 200,
statusDescription: 'OK',
headers: {
"content-type": [
{
"key": "Content-Type",
"value": "text/html; charset=utf-8"
}
]
},
body: content.toString()
};
}
exports.handler = ( event, context, callback ) => {
const { request, response } = event.Records[0].cf;
let userAgentStr = "";
if (request.headers['user-agent']) {
if (request.headers['user-agent'].length > 0) {
userAgentStr = request.headers['user-agent'][0].value;
}
}
let newResponse = null;
if ( userAgentStr.match(/facebookexternalhit|twitterbot/i) ) {
if ( request.uri.startsWith("/radio/release/") ) {
newResponse = buildReleaseResponse(request);
}
}
if ( newResponse === null ) {
console.log("Passthrough.");
callback(null, request);
}
else {
console.log("Overriding response with: " + JSON.stringify(newResponse));
callback(null, newResponse);
}
};
Here is the response shown in cloudwatch (conole.log)
{
"statusCode": 200,
"statusDescription": "OK",
"headers": {
"content-type": [{
"key": "Content-Type",
"value": "text/html; charset=utf-8"
}],
"cache-control": [{
"key": "Cache-Control",
"value": "max-age=100"
}]
},
"body": "\n \n \n \n \n \n \n \n \n \n \n \n Kompoz.com\n \n \n Yo Dog\n \n "
}
ANSWER
Answered 2021-Nov-15 at 20:59SOLVED! I'm embarrassed to report that this issue is caused by a typo on my part. In my response object, I had:
"statusCode": 200,
But it should have been:
"status": 200,
Happy to report that it is now working. Having said that, I wish that the AWS error message was better. The message "The body is not a string, is not an object, or exceeds the maximum size" really threw me off.
QUESTION
I am using selenium to login to Twitter, the email
and next
work when I run the code, but the password
and login
do not work.
I get the following error
:
Terminal
:
PS C:\Users\xxx\OneDrive - xxx\Folder\Chrome Webdriver> & C:/Users/xxx/Anaconda3/python.exe "c:/Users/xxx/OneDrive - xxx/Folder/Chrome Webdriver/Twitter Bot.py"
DevTools listening on ws://127.0.0.1:61543/devtools/browser/a9adea8e-a2bf-4a83-87ed-c39fb5a8f5aa
[32364:30428:1014/130628.297:ERROR:chrome_browser_main_extra_parts_metrics.cc(228)] crbug.com/1216328: Checking Bluetooth availability started. Please report if there is no report that this ends.
[32364:30428:1014/130628.297:ERROR:chrome_browser_main_extra_parts_metrics.cc(231)] crbug.com/1216328: Checking Bluetooth availability ended.
[32364:30428:1014/130628.298:ERROR:chrome_browser_main_extra_parts_metrics.cc(234)] crbug.com/1216328: Checking default browser status started. Please report if there is no report that this ends.
[32364:31252:1014/130628.301:ERROR:device_event_log_impl.cc(214)] [13:06:28.302] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[32364:31252:1014/130628.302:ERROR:device_event_log_impl.cc(214)] [13:06:28.303] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[32364:31252:1014/130628.302:ERROR:device_event_log_impl.cc(214)] [13:06:28.303] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[32364:30428:1014/130628.336:ERROR:chrome_browser_main_extra_parts_metrics.cc(238)] crbug.com/1216328: Checking default browser status ended. Traceback (most recent call last):
File "c:/Users/xxx/OneDrive - xxx/Folder/Chrome Webdriver/TwitterBot.py", line 35, in driver.find_element_by_xpath(password_xpath).send_keys(password)
File "C:\Users\xxx\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 478, in send_keys {'text': "".join(keys_to_typing(value)),
TypeError
:
sequence item 0: expected str instance, int found PS C:\Users\xxx\OneDrive - xxx\Folder\Chrome Webdriver>
[31296:25628:1014/130651.989:ERROR:gpu_init.cc(453)] Passthrough is not supported, GL is disabled, ANGLE is
[31488:15052:1014/130825.008:ERROR:gpu_init.cc(453)] Passthrough is not supported, GL is disabled, ANGLE is
Code
:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import time
def account_info():
with open('account_info.txt', 'r') as f:
info = f.read().split()
email = info[0]
password = [1]
return email, password
email, password = account_info()
options = Options()
options.add_argument("start.maximized")
driver = webdriver.Chrome(options=options)
driver.get("https://twitter.com/i/flow/login")
time.sleep(1)
email_xpath = '//*[@id="layers"]/div[2]/div/div/div/div/div/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[2]/label/div/div[2]/div/input'
next_xpath = '//*[@id="layers"]/div[2]/div/div/div/div/div/div[2]/div[2]/div/div/div[2]/div[2]/div[2]/div/div'
password_xpath = '//*[@id="layers"]/div[2]/div/div/div/div/div/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[2]/div/label/div/div[2]/div/input'
login_xpath = '//*[@id="layers"]/div[2]/div/div/div/div/div/div[2]/div[2]/div/div/div[2]/div[2]/div[2]/div/div'
time.sleep(1)
driver.find_element_by_xpath(email_xpath).send_keys(email)
time.sleep(0.5)
driver.find_element_by_xpath(next_xpath).click()
time.sleep(0.5)
driver.find_element_by_xpath(password_xpath).send_keys(password)
time.sleep(0.5)
driver.find_element_by_xpath(login_xpath).click()
ANSWER
Answered 2021-Oct-14 at 17:28You have typo in here:
password = [1]
should be:
def account_info():
with open('account_info.txt', 'r') as f:
info = f.read().split()
email = info[0]
password = info[1]
return email, password
QUESTION
I am actually working in a company and to improve SEO, i am trying to setup our angular (10) web app with prerender.io to send rendered html to crawlers visiting our website.
The app is dockerized and exposed using an nginx server. To avoid conflict with existing nginx conf (after few try using it), i (re)started configuration from the .conf file provided in the prerender.io documentation (https://gist.github.com/thoop/8165802) but impossible for me to get any response from the prerender service.
I am always facing: "502: Bad Gateway" (client side) and "could not be resolved (110: Operation timed out)" (server side) when i send a request with Googlebot as User-agent.
After building and running my docker image, the website is correctly exposed on port 80. It is fully accessible when i use a web browser, but the error occurs when i try a request as a bot (using curl -A Googlebot http://localhost:80
).
To verify if the prerender service correctly receive my request when needed i tried to use an url generated on pipedream.com, but the request never comes.
I tried using different resolver (8.8.8.8 and 1.1.1.1) but nothing changed.
I tried to increase the resolver_timeout to let more time but still the same error.
I tried to install curl in the container because my image is based on an alpine image, curl was successfully installed but nothing changed.
Here is my nginx conf file :
server {
listen 80 default_server;
root /usr/share/nginx/html;
index index.html;
location / {
try_files $uri @prerender;
}
location @prerender {
proxy_set_header X-Prerender-Token TOKEN_HERE;
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
resolver_timeout 60s;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
rewrite .* /index.html break;
}
}
}
And here is my Dockerfile:
FROM node:12.7-alpine AS build
ARG environment=production
WORKDIR /usr/src/app
COPY package.json package-lock.json ./
RUN npm install
COPY . .
RUN npm run build -- --configuration $environment
# Two stage build because we do not need node-related things
FROM nginx:1.17.1-alpine
RUN apk add --no-cache curl
COPY --from=build /usr/src/app/dist/app /usr/share/nginx/html
COPY prerender-nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
Hope you will have a track or an idea to help me
ANSWER
Answered 2021-Aug-18 at 08:22Erroneous part would be
curl -A Googlebot http://localhost:80
The way how prerender works, is accesses the fqdn you have sent to original webserver. So localhost:80 will not be accessible.
Try passing proper hostname, kind of
curl -H "Host: accessiblefrom.public.websitefqdn:80" http://localhost:80
Check out example on https://github.com/Voronenko/self-hosted-prerender
QUESTION
I want to 301 redirect
https://www.example.com/th/test123
to this
https://www.example.com/test123
See above url "th" is removed from url
So I want to redirect all website users to without lang prefix version of url.
Here is my config file
server {
listen 80;
server_name localhost;
absolute_redirect off;
root /usr/share/nginx/html;
index index.html index.htm;
#charset koi8-r;
#access_log /var/log/nginx/host.access.log main;
location / {
try_files $uri @prerender;
#try_files $uri $uri/ /index.html; # force routes to index.html
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
location @prerender {
proxy_set_header X-Prerender-Token JEWpuxaXuzdqvd7tKD1l;
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
# rewrite .* /$scheme://$host$request_uri? break;
# Following domain need to be dynamic
rewrite .* /https://www.drivemate.asia$request_uri? break;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
#try_files $uri $uri/ /index.html; # force routes to index.html
rewrite .* /index.html break;
}
}
I am using prerender for server side rendering purpose
ANSWER
Answered 2021-Jun-10 at 09:44Assuming you have locales list like th
, en
, de
add this rewrite rule to the server
context (for example, before the first location
block):
rewrite ^/(?:th|en|de)(?:/(.*))?$ /$1 permanent;
Modify (?:th|en|de)
capture group according to your list of used locales.
QUESTION
I set up prerender.io for CRA and it works well, but when bot hits URL without parameters it puts in the end of URL - string ".var"
I tried variations of (.*) but it seems not working. Any ideas?
Here is .htaccess file
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RequestHeader set X-Prerender-Token "TOKEN"
RequestHeader set X-Prerender-Version "prerender-apache@2.0.0"
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} googlebot|bingbot|yandex|baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff|\.svg))
RewriteRule ^(index\.html|index\.php)?(.*) https://service.prerender.io/%{REQUEST_SCHEME}://%{HTTP_HOST}/$2 [P,END]
ANSWER
Answered 2021-Jun-07 at 18:36Lately @MrWhite gave us another, better and simple solution - just add DirectoryIndex index.html
to .htaccess file will do the same.
From the beginning I wrote that DirectoryIndex
is working but NO! It seems it's working when you try prerender.io, but in reality it was showing website like this:
and I had to remove it. So it was not issue with .htaccess file, it was coming from the server.
What I did was I went into WHM->Apache Configurations->DirectoryIndex Priority and I saw this list
and yes that was it!
To fix I just moved index.html
to the very top second comes index.html.var
and after rest of them.
I don't know what index.html.var
is for, but I did not risk just to remove it. Hope it helps someone who struggled as me.
QUESTION
I've created a SPA - Single Page Application with Angular 11 which I'm hosting on a shared hosting server.
The issue I have with it is that I cannot share any of the pages I have (except the first route - /) on social media (Facebook and Twitter) because the meta tags aren't updating (I have a Service which is handling the meta tags for each page) based on the requested page (I know this is because Facebook and Twitter aren't crawling JavaScript).
In order to fix this issue I tried Angular Universal (SSR - Server Side Rendering) and Scully (creates static pages). Both (Angular Universal and Scully) are fixing my issue but I would prefer using the default Angular SPA build.
The approach I am taking:
- Files structure (shared hosting server /public_html/):
- crawlers/
- crawlers.php
- share/
- 404.json
- about.json
- work.json
- .htaccess
- index.html
- crawlers.php contains the following:
'.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= ''.PHP_EOL;
$html .= '';
echo $html;
}
?>
og:url
is not specified because I thought that by not specifying it, Facebook will be unaware of the actual content URL and will link its cards to the static file. It shouldn't be a problem as I made use of the http-equiv="refresh"
, which will redirect normal users to the correct URL.
- For example, 404.json contains the following:
{
"title": "404: Not Found | My Website",
"description": "My awesome description.",
"image": "https://www.mywebsite.com/assets/images/share/404.jpg",
"url": "https://www.mywebsite.com",
}
- .htaccess contains the following:
RewriteEngine On
RewriteBase /
# Allow robots.txt to pass through
RewriteRule ^robots.txt - [L]
# Allow social media crawlers to work
RewriteCond %{HTTP_USER_AGENT} (facebookexternalhit/[0-9]|Twitterbot)
RewriteRule ^(.+)$ /crawlers/crawlers.php?page=$1 [NC,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
# If the requested resource doesn't exist use index.html
RewriteRule ^ /index.html
When I am testing crawlers/crawlers.php?page=test-page, it works perfectly (after accessing https://www.mywebsite.com/crawlers/crawlers.php?page=test-page
), reason why I believe the issue is in the .htaccess condition below # Allow social media crawlers to work
. Sharing on Facebook still shows the meta tags of the first route (/), which means that the redirect to crawlers/crawlers.php doesn't work.
Also, on https://developers.facebook.com/tools/debug/sharing/ the url https://www.mywebsite.com/about
is not redirecting to https://www.mywebsite.com/crawelers/crawlers.php?page=about
.
I want to use the redirect to crawlers/crawlers.php for social media crawlers only for pages like this: https://www.mywebsite.com/about
, https://www.mywebsite.com/work
, etc but not for https://www.mywebsite.com
(the first route - /).
Any help is very much appreciated. Thanks!
ANSWER
Answered 2021-May-31 at 15:19Thanks to @CBroe's guidance, I managed to make the social media (Facebook and Twitter) crawlers work (without using Angular Universal, Scully, Prerender.io, etc) for an Angular 11 SPA - Single Page Application, which I'm hosting on a shared hosting server.
The issue I had in the question above was in .htaccess
.
This is my .htaccess
(which works as expected):
RewriteEngine On
# Force www.
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
# Allow robots.txt to pass through
RewriteRule ^robots.txt - [L]
# Allow social media crawlers to work
RewriteCond %{HTTP_USER_AGENT} (facebookexternalhit|WhatsApp|LinkedInBot|Twitterbot)
RewriteRule ^(.+)$ /crawlers/social_media.php?page=$1 [R=301,L]
# If the requested resource doesn't exist use index.html
RewriteRule ^ /index.html
PS I renamed crawlers.php
to social_media.php
, added WhatsApp and LinkedIn user agents and also added a redirect from mywebsite.com to www.mywebsite.com
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install twitterbot
Follow Steps 3-5 of this bot tutorial to create an account and obtain credentials for your bot. Copy the template folder from twitterbot/examples/template to wherever you'd like to make your bot, e.g. cp -r twitterbot/examples/template my_awesome_bot. Open the template file in my_awesome_bot in your favorite text editor. Many default values are filled in, but you MUST provide your API/access keys/secrets in the configuration in this part. There are also several other options which you can change or delete if you're okay with the defaults. The methods on_scheduled_tweet, on_mention, and on_timeline are what define the behavior of your bot, and deal with making public tweets to your timeline, handling mentions, and handling tweets on your home timeline (e.g., from accounts your bot follows) respectively.
Follow Steps 3-5 of this bot tutorial to create an account and obtain credentials for your bot.
Copy the template folder from twitterbot/examples/template to wherever you'd like to make your bot, e.g. cp -r twitterbot/examples/template my_awesome_bot.
Open the template file in my_awesome_bot in your favorite text editor. Many default values are filled in, but you MUST provide your API/access keys/secrets in the configuration in this part. There are also several other options which you can change or delete if you're okay with the defaults.
The methods on_scheduled_tweet, on_mention, and on_timeline are what define the behavior of your bot, and deal with making public tweets to your timeline, handling mentions, and handling tweets on your home timeline (e.g., from accounts your bot follows) respectively. Some methods that are useful here: self.post_tweet(text) # post some tweet self.post_tweet(text, reply_to=tweet) # respond to a tweet self.favorite(tweet) # favorite a tweet self.log(message) # write something to the log file Remember to remove the NotImplementedError exceptions once you've implemented these! (I hope this line saves you as much grief as it would have saved me, ha.)
Once you've written your bot's behavior, run the bot using python mytwitterbot.py & (or whatever you're calling the file) in this directory. A log file corresponding to the bot's Twitter handle should be created; you can watch it with tail -f <bot's name>.log.
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page