how to fix blogger duplicate content m=1 m=0 problem
I'm using this query
inurl:"m=" "site:mydomain.com"
to detect those posts with m=0 and m=1.
It would seem that what are seeing is simply the results of a site:
search. Using the site:
operator is not a "normal" Google search and has been shown to return non-canonical (including redirected) URLs in the SERPs. These are URLs that don't ordinarily get returned in a "normal" organic search (when no search operators are used). Even URLs that are the source of 301 redirects have been shown to be returned for a site:
search, when they are not returned normally. These non-canonical URLs are still crawled (and processed) by Google and they are often acknowledged in a site:
search.
Reference:
- How to submit shortened URLs to Google so that they are included in the index
- Related question: Google indexing duplicate content despite a canonical tag pointing to an extarnal URL. Am I risking a penalty from Google?
Normally, a rel="canonical"
(which you have already done) is sufficient to resolve such conflicts with query parameters and duplicate content. But note that it doesn't necessarily prevent the non-canonical pages from being indexed (which you see when doing a site:
search), but from being returned in a "normal" Google search.
blocked m=0 and m=1 on
robots.txt
....
You probably don't want to block these URLs from being crawled as it could damage your ranking on mobile search.
BTW what about Disallow: /.html, Allow: /.html$
Aside: This looks "dangerous". Google doesn't process the robots.txt
directives in top-down order. They are processed in order of specificity (length of URL), but when it comes to the use of wildcards, the order is officially "undefined" (which also means it could even change). The Allow:
directive is also an extension to the "standard" and might not be supported by all search engines. It would be better to be more explicit. eg. Disallow: /*?m=
. But, as mentioned, you probably should not be blocking these URLs in robots.txt
anyway.
See also my answer to this question for more info about robots.txt
and how it is processed:
- Robots.txt with only Disallow and Allow directives is not preventing crawling of disallowed resources