Skip to contents

The col values are processed through a median (i.e. low pass) filter to get .<col>_median, the squared differences between them .<col>_eps2, and its mean .<col>_sigma. A col value is considered an outlier if .<col>_eps2 is greater then .<col>_sigma.

Usage

filter_outlier(df, col, ksize, fill, keep = FALSE)

Arguments

df

a data frame of trajectory data

col

the variable to filter for outliers

ksize

the kernel size (must be odd)

fill

whether to substitute the outlier with the median

keep

whether to keep intermediate results

Value

a data frame with corrected outliers (if fill is TRUE). If keep is TRUE the intermediate columns .<col>_median, .<col>_eps2, .<col>_sigma and .<col>_outlier are included.

Details

This approach is inspired by (copied from ;-) the filter function in Xavier Olive's Python library traffic.

Examples

if (FALSE) {
library(readr)
library(dplyr)
library(anytime)
library(trrrj)
library(ggplot2)

ifile <- system.file("extdata", "belevingsvlucht.csv", package = "trrrj")
df <- readr::read_csv(ifile) %>%
  mutate(timestamp = anytime::anytime(time, tz = "UTC"))
df1 <- df %>%
  filter_outlier(col = baroaltitude, ksize = 17, fill = TRUE, keep = FALSE)
ggplot() +
  geom_line(data = df,  mapping = aes(x = timestamp, y = baroaltitude), colour = "blue") +
  geom_line(data = df1, mapping = aes(x = timestamp, y = baroaltitude), colour = "red")
}