pushState and SEO
Is pushState
bad if you need search engines to read your content?
No, the talk about pushState
is geared around accomplishing the same general process to hashbangs, but with better-looking URLs. Think about what really happens when you use hashbangs...
You say:
With hashbangs, Google knows to go to the escaped_fragment URL to get their static content.
So in other words,
- Google sees a link to
example.com/#!/blog
- Google requests
example.com/?_escaped_fragment_=/blog
- You return a snapshot of the content the user should see
As you can see, it already relies on the server. If you aren't serving a snapshot of the content from the server, then your site isn't getting indexed properly.
So how will Google see anything with pushState?
With pushState, Google just sees nothing as it can't use the JavaScript to load the JSON and subsequently create the template.
Actually, Google will see whatever it can request at site.example/blog
. A URL still points to a resource on the server, and clients still obey this contract. Of course, for modern clients, JavaScript has opened up new possibilities for retrieving and interacting with content without a page refresh, but the contracts are the same.
So the intended elegance of pushState
is that it serves the same content to all users, old and new, JS-capable and not, but the new users get an enhanced experience.
How do you get Google to see your content?
The Facebook approach — serve the same content at the URL
site.example/blog
that your client app would transform into when you push/blog
onto the state. (Facebook doesn't usepushState
yet that I know of, but they do this with hashbangs)The Twitter approach — redirect all incoming URLs to the hashbang equivalent. In other words, a link to "/blog" pushes
/blog
onto the state. But if it's requested directly, the browser ends up at#!/blog
. (For Googlebot, this would then route to_escaped_fragment_
as you want. For other clients, you couldpushState
back to the pretty URL).
So do you lose the _escaped_fragment_
capability with pushState
?
In a couple of different comments, you said
escaped fragment is completely different. You can serve pure unthemed content, cached content, and not be put under the load that normal pages are.
The ideal solution is for Google to either do JavaScript sites or implement some way of knowing that there's an escaped fragment URL even for pushstate sites (
robots.txt
?).
The benefits you mentioned are not isolated to _escaped_fragment_
. That it does the rewriting for you and uses a specially-named GET
param is really an implementation detail. There is nothing really special about it that you couldn't do with standard URLs — in other words, rewrite /blog
to /?content=/blog
on your own using mod_rewrite or your server's equivalent.
What if you don't serve server-side content at all?
If you can't rewrite URLs and serve some kind of content at /blog
(or whatever state you pushed into the browser), then your server is really no longer abiding by the HTTP contract.
This is important because a page reload (for whatever reason) will pull content at this URL. (See https://wiki.mozilla.org/Firefox_3.6/PushState_Security_Review — "view-source and reload will both fetch the content at the new URI if one was pushed.")
It's not that drawing user interfaces once on the client-side and loading content via JS APIs is a bad goal, its just that it isn't really accounted for with HTTP and URLs and it's basically not backward-compatible.
At the moment, this is the exact thing that hashbangs are intended for — to represent distinct page states that are navigated on the client and not on the server. A reload, for example, will load the same resource which can then read, parse, and process the hashed value.
It just happens to be that they have also been used (notably by Facebook and Twitter) to change the history to a server-side location without a page refresh. It is in those use cases that people are recommending abandoning hashbangs for pushState.
If you render all content client-side, you should think of pushState
as part of a more convenient history API, and not a way out of using hashbangs.
What about using the meta tag that Google suggests for those who don't want hash-bangs in their URLs:
<meta name="fragment" content="!">
See here for more info: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
Unfortunately I don't think Nicole clarified the issue that I thought the OP was having. The problem is simply that we don't know who we are serving content to if we don't use the hash-bang. Pushstate does not solve this for us. We don't want search engines telling end-users to navigate to some URL that spits out unformatted JSON. Instead, we create URLs (that trigger other calls to more URLs) that retrieve data via AJAX and present it to the user in the manner we prefer. If the user is not a human, then as an alternative we can serve an html-snapshot, so that search engines can properly direct human users to the URL that they would expect to find the requested data at (and in a presentable manner). But the ultimate challenge is how do we determine the type of user? Yes we can possibly use .htaccess or something to rewrite the URL for search engine bots we detect, but I'm not sure how fullproof and futureproof this is. It may also be possible that Google could penalize people for doing this sort of thing, but I have not researched it fully. So the (pushstate + google's meta tag) combo seems to be a likely solution.