Minimum number of samples for kriging interpolation
When you use "default values" you aren't really kriging, you're just applying the kriging algorithm--which as you have found, is poor when used with these data.
(I will step up on a soapbox for a brief rant: in my opinion, the fastest way to get bad results with a computer program is to accept its default parameters. ArcGIS is one of the richest, most powerful environments for getting bad results this way. The moral is do not use software for important work until you understand how to control it. Down from the soapbox now...)
For kriging to work you have to conduct an intensive preliminary statistical analysis of the data known as "variography." How well this ultimately performs depends on the data as well as your geostatistical skills. (Entire books have been written about variography, including the seminal Mining Geostatistics by Journel & Huijbregts and Variowin by Yvan Pannatier.) Although people have succesfully kriged as few as seven data points (in a monograph by Robert Jernigan published by the US EPA in the late 1980's), and in principle you can krige using just two or three points (I have done this to demonstrate the algorithm), rules of thumb in the literature range from a minimum of 20 points to 100 points and the consensus appears to be around 30 points.
In your case--although you do not describe the data--you have some clear problems, including a highly skewed distribution and a distinct lack of evidence of stationarity. These require special statistical treatment or specialized forms of kriging (such as a spatial generalized linear model). You will not get good results when kriging such data until you have a very large amount of data.
The legend suggests you might be trying to create a density grid rather than actually interpolate data: although the outputs of the two procedures may look the same, they do distinctly different things and have distinctly different interpretations. You interpolate when the data are considered samples from some hypothetical continuous surface. Interpolation predicts the unsampled values. Standard examples include elevation measurements (which sample the earth's surface) and temperature measurements (which sample a "temperature field"). You compute a density when you have complete information about the amount of something and you wish to represent a smoothed version of that amount per unit area. (In contrast with interpolation, there do not exist any unsampled values to predict.) The standard example is a population density: the data are counts of all individuals within an area; the output is a map of population density.