How to improve SEO for Serverless Websites?
If you are willing to use CloudFront on top of your S3 bucket, there is a new possibility to solve your problem using prerender on the fly. Lambda@Edge is a new feature that allows code to be executed with low latency when a page is requested. With this, you can verify if the agent is a crawler and prerender the page for him.
01 Dec 2016 announcement: Lambda@Edge – Preview
Just last week, a comment that I made on Hacker News resulted in an interesting email from an AWS customer!
(...)
Here’s how he explained his problem to me:
In order to properly get indexed by search engines and in order for previews of our content to show up correctly within Facebook and Twitter, we need to serve a prerendered version of each of our pages. In order to do this, every time a normal user hits our site need for them to be served our normal front end from Cloudfront. But if the user agent matches Google / Facebook / Twitter etc., we need to instead redirect them the prerendered version of the site.
Without spilling any beans I let him know that we were very aware of this use case and that we had some interesting solutions in the works. Other customers have also let us know that they want to customize their end user experience by making quick decisions out at the edge.
This feature is currently in preview mode (dec/2016), but you can request AWS to experiement it.
If you are using S3, you must prerender the pages before uploading them. You can't call Lambda functions on the fly because the crawler will not execute JavaScript. You can't even use Prerender.io with S3.
Suggestion:
- Host your website locally.
- Use PhanthomJS to fetch the pages and write a prerendered version.
- Upload each page to S3 following the page address*.
* E.g.: the address from example.com/about/us must be mapped as a us.html file inside a folder about in your bucket root.
Now, your users and the crawlers will see the exactly the same pages, without needing JavaScript to load the initial state. The difference is that with JavaScript enabled, your framework (Angular?) will load the JS dependencies (like routes, services, etc.) and take control like a normal SPA application. When the user click to browse another page, the SPA will reload the inner content without making a full page reload.
Pros:
- Easy to setup.
- Very fast to serve content. You can also use CloudFront to improve the speed.
Cons:
- If you have 1000 pages (for e.g.: 1000 products that you sell in your store), you need make 1000 prerendered pages.
- If your page data changes frequently, you need to prerender frequently.
- Sometimes the crawler will index old content*.
* The crawler will see the old content, but the user will probably see the current content as the SPA framework will take control of the page and load the inner content again.
You said that you are using S3. If you want to prerender on the fly, you can't use S3. You need to use the following:
Route 53 => CloudFront => API Gateway => Lambda
Configure:
- Set the API Gateway endpoint as the CloudFront origin.
- Use "HTTPS Only" in the "Origin Policy Protocol" (CloudFront).
- The Lambda function must be a proxy.
In this case, your Lambda function will know the requested address and will be able to correctly render the requested HTML page.
Pros:
- As Lambda has access to the database, the rendered page will always be updated.
Cons:
- Much slower to load the webpages.