High short circuit failure rate of aluminum electrolytic capacitors during the production processes
That high a failure rate is unheard of for a top-quality supplier like Nichicon when properly assembled and operated conservatively. Even for no-name parts it’s not at all usual- one in 10,000 might be plausible, but that’s on the high side. Short circuit failures are very rare for aluminum electrolytics. I did once see a few in a bag of 1,000 from a Taiwan supplier that were completely missing the rubber seal so the electrolyte also had gone AWOL- that was actually funny.
You can contact Nichicon directly to confirm the parts are genuine (or not). They may be able to tell just from photos or you might have to courier samples.
You can review their application information to make sure you are not abusing the parts in some way- not only voltage but also ripple current, possible reverse voltage or reverse installation (that is one thing that will cause shorts). Poorly made counterfeits might be marked incorrectly so they are reversed even though they appear to be installed correctly. Your transformerless supply might be stressing the part upon application of power.
Also confirm that the chemicals and processes used in the PCBA and any subsequent operations such as cleaning are approved.
I would definitely pull 100% of the parts from that batch of boards and replace them with known good ones. Use good tools and skilled technicians so that reliability isn’t unduly compromised by the rework. Field failures are extremely expensive in dollars and in reputation. Give them a good visual inspection under a microscope, or at least with a magnifier, and see if you can identify differences between batches or within a batch.
As far as prevention in the future, that is a bit off-topic, but there are a few approaches to control “quality fade” and substitutions of inferior parts - one of which is third party inspections. The assembly house or supplier you used is suspect if they allowed counterfeit parts to be procured. You may have a better choice of suppliers at higher quantity levels and you can ask how they intend to guarantee genuine parts are used. The Shenzhen markets are a bit of the Wild West, so you need to take care. Nichicon will have authorized distribution channels there, but it’s also possible to procure parts of unknown history at the many retail shops in Huaqiangbei or online at Taobao etc.
There is this extensive document from Nichicon for the application of aluminium electrolytic caps.
They discuss the failure rate of aluminium caps and for the one they have pictured there the failures only appear after a test time of 6000 hours. So I would not expect an immediate failure rate of 2 % for high quality Nichicon capacitors.
They have an extensive list of advice on how to correctly design in the aluminium capacitor. One point which gets mentioned is for example that the use of a halogen containing cleaning agent might seep halogen into the capacitor which then causes different failures.
For the different kind of failure modes of the capacitors they have a paragraph on short circuits:
1) Short Circuit Short circuits in the field are very rare. A short circuit between the electrodes can be caused by vibration, shock and stress on leads. It can also be caused by application of voltage above the rated voltage, application of extreme ripple or by application of pulse current
You are saying that you have 300 boards running fine, which suggest that something fishy is going on with the new batch you received. I'd discuss this issue with the supplier and ask them if they changed anything in the manufacturing going from (I guess) pre-production runs to a complete production run. Maybe something goes wrong during the mounting of the capacitors - are the leads getting bent in a bad way, are they bent by hand?
Read all the information from Nichicon and check if any of the points might be problematic in your design. If you have an X-Ray machine accessible that might give some insight in the failed caps as well.
The failure rate is unacceptable.
First step is to contact the manufacturer of the defective component documenting your experience. In this case with Google this will not be a problem. Include macro photographs of the component to assist with identification of fakes. If your application is not to be copyrighted or a trade secret, and you are not concerned about information leakage then include a circuit and description of circuit in normal and abnormal operating conditions. A partial circuit diagram may be sufficient. Offer to supply component samples to assist any investigation.
Once you have identified an unacceptable failure rate with a batch of components the whole batch should be quarantined and all boards reworked. No if or buts. We I say all, I mean all. For sold items issue a product recall. Document all the steps you have taken to address the problem. A company I used to work for sold a device that had a capacitor failure which caused an office fire. I don't know the level of compensation offered but it would have more than the cost of correctly specified components. Consumer protection legislation varies and I can't address that.
Soak testing of boards is usually performed at a raised temperature for a period to promote infant mortality failures. If applicable perform the tests with marginal supply voltage(s) to weed out marginal boards.
Start reading up on Lot Quality Assurance Testing, this is not a dark art and not a new problem