-
-
Save Stanback/6998085 to your computer and use it in GitHub Desktop.
# Note (November 2016): | |
# This config is rather outdated and left here for historical reasons, please refer to prerender.io for the latest setup information | |
# Serving static html to Googlebot is now considered bad practice as you should be using the escaped fragment crawling protocol | |
server { | |
listen 80; | |
listen [::]:80; | |
server_name yourserver.com; | |
root /path/to/your/htdocs; | |
error_page 404 /404.html | |
index index.html; | |
location ~ /\. { | |
deny all; | |
} | |
location / { | |
try_files $uri @prerender; | |
} | |
location @prerender { | |
#proxy_set_header X-Prerender-Token YOUR_TOKEN; | |
set $prerender 0; | |
if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") { | |
set $prerender 1; | |
} | |
if ($args ~ "_escaped_fragment_|prerender=1") { | |
set $prerender 1; | |
} | |
if ($http_user_agent ~ "Prerender") { | |
set $prerender 0; | |
} | |
if ($prerender = 1) { | |
rewrite .* /$scheme://$host$request_uri? break; | |
#proxy_pass http://localhost:3000; | |
proxy_pass http://service.prerender.io; | |
} | |
if ($prerender = 0) { | |
rewrite .* /index.html break; | |
} | |
} | |
} |
If anyone sees problems with their URL looking like this: http://myurl.com/post/123#!_escaped_fragment_= (notice the #! AND the_escaped_fragment_
)
Change this line:
rewrite .* /$scheme://$host$request_uri break;
to:
rewrite .* /$scheme://$host$request_uri? break;
According to (nginx.org)[http://wiki.nginx.org/HttpRewriteModule]:
If you specify a ? at the end of a rewrite then Nginx will drop the original $args (arguments). When using $request_uri or $uri&$args you should specify the ? at the end of the rewrite to avoid Nginx doubling the query string.
Any tips on how I can modify this to work with the Facebook Open Graph crawler (https://developers.facebook.com/tools/debug/)?
With the above, and with the canonical url and og:url metadata tags set as domain.com/#!/path if a user likes that page Facebook will save the share URL as domain.com/?_escaped_fragment=/path. So when a user clicks on the share link on Facebook they get directed through the proxy.
@saintberry Good question, you can try getting rid of the following lines:
if ($args ~ "_escaped_fragment_|prerender=1") {
set $prerender 1;
}
Since it's already checking for the user agent, I don't think there will be a problem with omitting that additional check. An alternate option that you could experiment with would be to set $prerender to 0 if the $http_referer matches Facebook.
FYI:
Googlebot is starting to render Javascript so you may want to remove it from the user agents list (after testing in webmaster tools)
See:
http://googlewebmastercentral.blogspot.com/2014/05/rendering-pages-with-fetch-as-google.html
Could this be used with another proxy pass to a node.js application using an upstream? I can't seem to get it to work.
Missing a semicolon on line 8 (https://gist.github.com/Stanback/6998085#file-nginx-conf-L8)
As I am not familiar with all this, I'd like to have some general explanation and guidance. Forgive me for being ignorant. Given the docs I have read I am assuming this:
2: listen to port 80 )
3: I have no clue what this means
4: the server name
6: the root path where the documents are
8: error page
9: the index page
11-13: location with obviously a regex, probably this is saying that we are not serving yourserver.com/. ??
15-17: first we try serving the uri that is input, if we cannot we will default to named location 'prerender' ??
Now, I have a case where I want to serve any incoming request directly, except for Facebot, Twitterbot and maybe some other bots. But Googlebot for instance just processes the JS fine.
BTW: I am not using <meta name="fragment" content="!" />
For Facebot I want to serve very plain HTML with some og meta tags. This is the first I will be focussing on.
Now I would prefer for the regular website not suffering from the overhead of processing conditional logic. As it seems that is impossible, right?
What confuses me is the try_files. This is obviously an AJAX / Angular scenario. So, we have no static pages, only index.html. That is why try_files will always hit the fallback @prerender, am I right?
With regards to the prerender section, I would probably have something like:
if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {
34 rewrite .* /$scheme://$host$request_uri? break;
35 #proxy_pass http://localhost:3000;
36 proxy_pass http://service.prerender.io;
}
I think the above means that the uri is changed first [line 34], but it seems to change it to what is was exactly? And then passed on to the server at http://service.prerender.io [line 36].
If prerendering does not apply you are serving index.html [line 39](to the b browser)
Finally, I have the following questions:
- I had a discussion with the developer he said he could check for the http_user_agent in code (in this case node.js).
- what are the pros and cons of doing it it in nginx like this?
- I have also seen an alternative method outlined here: http://serverfault.com/questions/316541/check-several-user-agent-in-nginx
- Expecially the map sections seems a pretty elegant solution for checking multiple distinct cases, I would like to have your comment on this as well.
Last but not least, I have been trying / experimenting in order to learn something. That will never hurt.
So had this scenario in mind.
IF origin = Facebot
use proxy_pass to relay to server that serves what Facebook needs
ELSE IF origin = Twitter
use proxy_pass to relay to server that serves what Twitter needs
...
ELSE
serve index.
I would really have liked to do some conditional processing using the location derivative.
# route to server for prerendered stuff
location ????? {
proxy_pass http://localhost:8081;
}
But that does not seem possible. So we require IF or MAP or something similar.
Finally, I tried something like this:
location / {
if ($ua_redirect != '') {
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass $scheme://localhost:$ua_redirect;
}
}
It seems proxy_pass does not like dynamic stuff? So I would need to add a resolver:
resolver 127.0.0.1;
And deploy a DNS server on my host machine. Am I right?
@dfmcphee Did you get that to work? Encountered the same problem right now. I've got one proxy_pass for my node-servers and the real clients requesting the web page, and want to redirect the rest to another proxy_pass. Anyone got a solution for this?
Hi Brian,
I am using angularjs based website with nginx as the server and without node js.
I have used the nginx settings as mentioned here: https://gist.github.com/thoop/8165802
I have configured a two server approach:
Website is running on a server: www.mywebsite.com. Running on SSL port 443
Prerender is running on another server: www.myprerenderservice.com. Running on port 80
The prerender service is running with PM2 as the process manager
In google crawler, I have used ?escaped_fragment= as well
The google crawler is sometimes able to return the results, while sometimes not. If it fails, I just tried killing the Prerender PM2 service and start it again and clear memory on server and then try it again. It starts working again for once. But it fails again. Don't know what is happening. Can you help?
I want to limit the rendering of only partial pages. Do you have better settings?
# Except for the home page and the info page, other pages are not forwarded.
if ($document_uri !~ "/index.html|/info.html") {
set $prerender 0;
}
Is there a way to avoid using prerender.io at all?
Let's say I have generated static files with rendertron, I want to store them in a sub-folder
How can i ask nginx "if is googlebot|otherbot", please load files from this directory?
Why would I need a rendertron server running all the time, or why would I need prerender.io, if the end result is kinda the same as with static files...?
I encountered the following error when using this Gist:
To fix it, I commented out line #20: