What is the consensus on handling pagination in 2019 on big result sets?
I would consider removing pagination completely:
- It isn't good for search engines.
- It doesn't pass link juice beyond page 2.
- It creates tons of additional low quality pages.
- It isn't good for users.
- Only a small percentage of users (less than 5%) ever use pagination
- Of those users that do use pagination, none get more than a few pages in
There are better ways to handle large product catalogs for both search engines and for users. Pagination isn't needed on a modern site.
What should you do instead of pagination?
- Have more products on a page 1 than users are likely to need. I suggest listing 100 products on the category page.
- Implement infinite scroll. Infinite scroll is easier for users than pagination. Be sure to provide a large number of products visible before scrolling. Googlebot never scrolls. It is a common error to implement infinite scrolling in such a way that Googlebot sees blank pages or pages with just a couple links that are above the fold.
- Provide site search. Users like to search rather than browse through endless lists.
- Implement faceted navigation. Users like to be able to drill down to products by attributes such as "under $100", "with X feature", or "4+ stars". You can let search engines crawl facet pages that have exactly one attribute selected.
- Find other ways to link to each product page. It is far better for search engines if product pages link to each other. Many sites use "related products", "customers who bought this also bought", and "featured products" sections on product pages to highlight other products. This can be useful for users, but it is often primarily for search engines. That is why this site has the "Related" questions section on the left.
If you do implement pagination (and you probably will because it is so easy to program) I would suggest:
- First choice: Prevent robots from even crawling page 2+ by listing them in robots.txt. This might mean using a separate prefix for pages 2+ such as starting those URLs with
/pages/
so you can useDisallow: /pages
. Pagination doesn't pass link juice effectively, so it won't hurt the rankings of product pages to do this. It will prevent new deep product pages from being discovered. So you absolutely need other links into every product page from other product pages before you do this. - Second choice: Use
noindex
on page 2+ to prevent search engines from indexing the low quality pages. If you can't prevent the pages from being crawled, at least prevent them from being indexed. - Third choice: Let search engines crawl and index all the pages. If you don't implement other links to every product page, I would go with this one. It won't actually hurt your SEO that much. It will allow search engines to discover all your content. Google will probably notice that the pagination pages are low quality and choose not to index them anyway. While search engines will discover all your product pages, most of them won't have enough link juice to get indexed through pagination alone. Expect mostly only the ones listed on pages 1 and 2 to get indexed. Of the ones beyond page 1, they won't rank well even if they do get indexed until you find a way to get other links from your site to those pages.
As you noted, rel=prev/next
isn't used by Google. You can use it if you want, but it won't change anything.
rel=noindex,follow
ends up being the same and rel=noindex,nofollow
for Google because Google doesn't pass link juice through pages that are not in its index. As I noted, it doesn't really matter anyway. Page 3 has almost no link juice available to pass.
rel=canonical
to the first page isn't going to work anymore. These days Google is ignoring the canonical signal if the content doesn't appear to be nearly duplicate. Because the products listed are different, Google is likely to ignore any canonical signals between paginated pages.
More About Pagination and PageRank
Google Pagerank (PR) used to be measured using the Google Toolbar on a logarithmic scale from 0 to 10. In general, I would say you needed Toolbar PageRank (tPR) of 1 to get indexed, and 2 to rank for competitive tail terms. To make calculations of PR easier, I usually work in Linear Link Juice units (LLJ).
- 0 tPR = 1 LLJ
- 1 tPR = 10 LLJ
- 2 tPR = 100 LLJ
- 3 tPR = 1,000 LLJ
- 4 tPR = 10,000 LLJ
- ...
When I say that pagination doesn't pass PageRank beyond page 2, I'm assuming that each page in the pagination links only to the next page. In that case it is very easy to see what happens. Let's say that your page 1 category page has tPR of 3. It would have 1,000 units of LLJ, of which 900 are available to pass (due to the PageRank damping factor). In the case that there are exactly 21 links on your page 1 (20 to products and one to page 2) then each of those other pages has a LLJ of 43 or a tPR of 1.6. Enough to get indexed and rank decently well.
Page 2 has 43 LLJ to pass. Each page it links to only gets 1.8 LLJ or 0.3 tPR. Probably not enough to get the product pages indexed. Certainly not enough PageRank for page 3 to pass anything meaningful on at all.
When you introduce 10 pagination links on every page it makes the calculations much harder because there are feedback loops into the calculation. You have to build a link graph and calculate flow on multiple iterations through it. You end up in a similar situation. You have 20 products that get a decent amount of PageRank from page 1. Pagination pages 2 through 11 get similar amounts of PageRank and you might be able to get the 200 products they link to indexed. Beyond page 10, it is similar to page 3 in the single link model.
In the end, calculating PR is mostly an academic exercise. Other factors such as what Google identifies as low quality end up playing big roles. Google will likely choose not to index your paginated pages because they don't make good landing pages. In my experience, it appears that Google treats non-indexed pages the same as if they had noindex
meta tags. In other words, even if you can get PR to them, it doesn't matter if Google doesn't want to index them. It won't end up passing PageRank through them.
I second what @Stephen Ostermiller said but I unfortunately have to disagree about his suggestion of implementing Infinite scroll.
Infinite scroll and load more
According to John mueller, Googlebot expands the viewport height to simulate the rendering of the page as if it would be displayed on an extremely long display. If your page with infinite scroll is bigger than the viewport all the links beyond the limit of googlebot will never be crawled. In other words, googlebot sees the page with infinite scroll as one long page. You could also create a long page to be sure that googlebot crawls it.
On top of that John Mueller mentioned that Googlebot does not "click" on Javascript "load more" buttons.
Reference - Google webmaster hangout with John Mueller: https://youtu.be/WAagTHeF9N0?t=1320
Less paginated pages
Consequently, I think that the only way to go through this would be to create paginated pages with as much content as you can to lower the number of paginated pages. For instance, I increased the number of posts I published on my category pages from 15 to 60. My goal was to get a page with a 1mb transferred file size. To do so, I have optimized my Jpg files. My wordpress theme is really responsive and display tiny images on mobile phones. That's why it is important to use an image compression tool like https://squoosh.app/ (Squoosh is a free online compression tool maintained by Google Chrome labs.)
Consequently, Googlebot has only three pages to crawl to see all the links. I could have removed the pictures on the category page so that I could have added many more links on this category page.
So, if you have 10 paginated pages with 10 links each, if you increase the number of links to 50, it means that it will only generate 2 paginated pages. You will save 8 paginated pages. It is a huge benefit in terms of crawl budget. This also mean that every time you add an article to the category, It will take time before a new link reaches the bottom of the page and goes to the second page.
My observation:
I have looked into my access log several times. I have a website powered by Wordpress which feature paginated category pages. Usually, it often crawls the first page of the category page, then it rarely crawls the second page, and crawls even more rarely the third page. In my case, the third page is crawled every 2 months to be precise. So, the more you have paginated pages and the less likely Google will crawl the subsequent pages.
To this extent, Google treats paginated page very much like regular pages when they are buried deep inside the website structure.
I have added and removed (prev/next) tag several times, and I didn't see any changes on google or Bing.
Also, I have found this photo of a seminar with John Mueller. There is a slide entitled pagination without rel prev/next. And it reminded me what John Mueller said.
- Link naturally between pages (granted. everybody does it)
- Use clean URLS (Google advises avoiding /?=parameters or 1 parameter)
- Paginated content vs Detail links: J. Mueller suggest to noindex paginated pages if the links within those pages can be found somewhere else.
Google wanted to kill the ranking of those paginated pages that you can see on clickbait websites in the ads at the bottom of some websites.
For example: "the 10 richest men in the world". The webmaster divided the page into 10 forcing readers to click "next" to discover every rich man. They do so to force readers to see every advertisement. However, this page will never rank well in Google since the subsequent paginated pages will get (almost) no google juice.