To eliminate any large spikes in this matlab dataset example, we need to first loop over and come up with the median but not all the data points (only the data points that are particularly large). To decide which values are particularly large, we need to come up with some kind of threshold. Then any values that are above that threshold get replaced by the median of some number of numbers around them. We can generate a histogram of all the data values to pick the threshold levels. We can assume by looking at the histogram which ones are the real data values with high peaks and which ones are the outliers. In this example from the generated histogram 2 threshold levels were set as: 5 (threshold1), -5 (threshold2). We require here 2 threshold levels since there is high spike noise on both the positive and negative vertical axis of the original signal. Any data values above threshold1 and below threshold2 is going to be noise and that has to be removed generating the filtered signal (filtsig).

Then using a running mean filter algorithm in the time domain to set each data point in the previous filtered signal (filtsig) to be an average of the surrounding points from this signal generating the cleaned signal (cleanedsignal). This filter is not an appropriate filter for all kinds of noise. This is really specific for when noise is distributed positive and negative relative to the signal of interest.

Every time we apply a temporal filter to the data, regardless of the type of filter, could set the edges ‘edge effects’ of the time series to be the original signal or set zeros or just ignore them. We usually have to figure out what’s the best way to deal with ‘edge effects’ on a case by case basis given our specific application. In this example zeros were set at the edges of this filter.

Here is the generated MATLAB code in a .pdf file that illustrates the above :