A bug in Excel? Conditional formatting for marking duplicates also highlights unique value
It is indeed because *
is treated as a wildcard.
The way around it, is to use a formula to insert a tilde (~
) in order to escape the asterisk (*
) for your conditional formatting:
=COUNTIF(A:A,SUBSTITUTE(A1,"*","~*"))>1
After further research of the behavior and documentation, I can answer the question from high-level perspective:
This is rather an undocumented behavior than a defect.
Other answers reminded us of use of wildcard characters (*
, ?
, ~
) in Excel formulas. The question unanswered before is if they are expected also in conditional formatting. Microsoft documents the Duplicate Values mode of conditional formatting in the following articles:
- Find and remove duplicates
- Filter for unique values or remove duplicate values
- Filter for or remove duplicate values
- Highlight patterns and trends with conditional formatting
Nowhere in these articles is mentioned that the internal algorithm searching for duplicates still respects wildcard characters (*
, ?
, ~
). On the contrary, support of wildcards is explicitly named in functionalities, where presence of wildcards is obvious and expected:
- standard search box
- functions like SEARCH(), SEARCHB(), COUNTIF(), ...
But back to duplicates: it can be discussed whether sample values PT_INTERNAL2859736
and *736
are duplicates by definition. By common sense, no. In Excel, obviously yes. It is possible that Excel simply uses its standard search algorithm (which honors wildcard characters) also for searching of cells with duplicate values... and the results of showing duplicate values can be quite unexpected as you can see in the question or here:
Based on the above, I would say this is not a defect, but an unexpected and undocumented behavior.
And this poses a problem in real-life scenarios. You were not warned, that you need a special formulas to discover real duplicates. You were just presented by Duplicate values conditional formatting type without further explanation. Today, I took a list of 2000 values to provide data for the customer and false positives were spotted only by coincidence and after double checking of results. I almost deleted unique values considering them to have duplicates somewhere in the list.
Current behavior is logical from Excel viewpoint but draws a huge exclamation mark for use by inexperienced users. If something should be fixed, it is at least the documentation.
If you Google for excel asterisk wildcard conditional formatting
you'll find someone with the same issue.
There the proposed solution is to use a custom formula to check if the value is a duplicate.
The formula looks as follows:
=SUMPRODUCT(--(("~"&A2)=("~"&$A$2:$A$4)))>1
And the result is: