Remove noise from data

You could try BilateralFilter:

ListLinePlot[{data, 
  BilateralFilter[data, 2, .5, MaxIterations -> 25]}, 
 PlotStyle -> {Thin, Red}]

enter image description here

Or alternatively, MeanShiftFilter can produce similar results:

ListLinePlot[{data, 
  MeanShiftFilter[data, 5, .5, MaxIterations -> 10]}, 
 PlotStyle -> {Thin, Red}]

enter image description here

Third alternative, as mentioned by @Xavier in the comments, is to apply TrimmedMean over a sliding window:

ListLinePlot[{data, ArrayFilter[TrimmedMean, data, 20]}, 
 PlotStyle -> {Thin, Red}]

enter image description here

As requested in the comments, a Savitzky Golay filter:

ListLinePlot[{
  data,
  ListConvolve[SavitzkyGolayMatrix[{10}, 2], 
   ArrayPad[data, 10, "Fixed"]]
  }, PlotStyle -> {Thin, Red}]

enter image description here

For comparison:

Show[
 ListPlot[data, PlotLegends -> {"Raw Data"}],
 ListLinePlot[{BilateralFilter[data, 2, .5, MaxIterations -> 25], 
   MeanShiftFilter[data, 5, .5, MaxIterations -> 10], 
   ArrayFilter[TrimmedMean, data, 20], 
   ListConvolve[SavitzkyGolayMatrix[{10}, 2], 
    ArrayPad[data, 10, "Fixed"]]},
  PlotLegends -> {"BilateralFilter", "MeanShiftFilter", 
    "ArrayFilter[TrimmedMean]", "SavitzkyGolay"}], ImageSize -> 800]

enter image description here

  • MeanShiftFilter and BilateralFilter produce a smooth result, and are almost indistinguishable with these parameters.
  • The sliding window TrimmedMean filter technique looks more "ragged" in comparison.
  • I couldn't get a smooth curve with the Savitzky Golay filter, probably because the large outliers aren't well suited to linear filtering.

You'll have to play with the parameters to each of them to get the results you want.


I would suggest using a median filter with small radius to eliminate the large spikes, then a mean filter to smooth the remaining signal. @Xavier essentially combines these two filters by using TrimmedMean.

Other than the large spikes, your data seem to have a strong signal with a period of about 10 points. You could use BandstopFilter to remove this, or LowpassFilter to eliminate this oscillation plus higher frequencies.

In the following, I removed the mean from your original data.

Manipulate[
   Module[{f},
      f = LowpassFilter[MedianFilter[data, r], w];
      ListLinePlot[{data, f - Mean[f]}, ImageSize -> 600, Frame -> True, 
                   PlotStyle -> {Thin, Thick}]
   ],
   {{r, 0, "Median Filter Radius"}, 0, 10, 1, Appearance -> "Labeled"},
   {{w, 3., "Cutoff Angular Frequency"}, 0.002,Pi, Appearance -> "Labeled"}]

median and lowpass filter


I am not sure what is called "noise" in the question, from the description, I think it is about removing outliers. This solution uses Quantile regression twice: to detect the outliers, and then to find quantile regression curves in the data without the outliers.

Load the package:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/QuantileRegression.m"]

Adding x-coordinates to the data:

data = Transpose[{Range[Length[data]], data}];

Selection of quantiles to detect the outliers:

qs = {0.05, 0.5, 0.98};
{qs[[1]], 1 - qs[[-1]]}*Length[data]
(* {25.6, 10.24} *)

Quantile regression with the selected quantiles:

qfuncs = QuantileRegression[data, 15, qs];

Finding the top outliers:

topOutliers = Select[data, qfuncs[[-1]][#[[1]]] < #[[2]] &]

(* {{54, 8.16422}, {145, 8.16422}, {155, 8.14875}, {203, 
  8.19841}, {289, 8.14254}, {370, 8.17358}, {433, 8.17358}} *)

Finding the bottom outliers:

bottomOutliers = Select[data, qfuncs[[1]][#[[1]]] > #[[2]] &]

(* {{29, 7.9156}, {78, 8.08349}, {81, 7.98714}, {101, 
  8.03685}, {127, 7.99335}, {140, 8.01821}, {178, 7.8689}, {182, 
  8.07728}, {220, 8.08349}, {263, 7.98714}, {268, 8.00884}, {323, 
  8.12381}, {331, 8.12696}, {334, 8.09276}, {386, 8.0617}, {387, 
  8.05243}, {406, 8.08349}, {454, 8.0617}, {456, 8.05864}, {486, 
  8.07412}, {496, 8.10833}} *)

Plot data, regression quantiles, and outliers:

qfPlot = ListLinePlot[
   Table[{#, qfuncs[[i]][#]} & /@ 
     Rescale[Range[0, 1, 0.005], {0, 1}, MinMax[data[[All, 1]]]], {i, 
     1, Length[qfuncs]}], PerformanceGoal -> "Quality", 
   PlotRange -> All, PlotTheme -> "Detailed", PlotLegends -> qs];
Show[{ListPlot[data, PlotRange -> All, PlotStyle -> {GrayLevel[0.5]}, 
   PlotTheme -> "Detailed"], qfPlot, 
  ListPlot[{topOutliers, bottomOutliers}, PlotStyle -> {{
      Blue, PointSize[0.01]}, {Red, PointSize[0.01]}}]}, 
 ImageSize -> 600]

enter image description here

Remove the outliers from the data:

newData = Complement[data, Join[topOutliers, bottomOutliers]];

Make another quantile regression computation over the new data. (This time is to facilitate analysis instead of detecting outliers.)

Block[{data = newData, qfuncs, qs = {0.05, 0.25, 0.5, 0.75, 0.95}},
 qfuncs = QuantileRegression[data, 40, qs]; 
 Show[{ListPlot[data, PlotStyle -> GrayLevel[0.5], PlotRange -> All, 
    PlotTheme -> "Detailed"], 
   ListLinePlot[
    Transpose@
     Map[Thread[{#, Through[qfuncs[#]]}] &, 
      Rescale[Range[0, 1, 0.005], {0, 1}, MinMax[data[[All, 1]]]]], 
    PlotStyle -> Map[If[# == 0.5, Thick, Thin] &, qs], 
    PlotLegends -> qs]}, ImageSize -> 600]]

enter image description here

Obviously other methods of signal analysis can be applied to the cleaned data. In this particular case, the cleaned data would give better results for the conditional PDF/CDF reconstruction shown in this blog post "Estimation of conditional density distributions".