The col
values are processed through a median (i.e. low pass) filter
to get .<col>_median
, the squared differences between
them .<col>_eps2
, and its mean .<col>_sigma
.
A col
value is considered an outlier if .<col>_eps2
is greater then
.<col>_sigma
.
Arguments
- df
a data frame of trajectory data
- col
the variable to filter for outliers
- ksize
the kernel size (must be odd)
- fill
whether to substitute the outlier with the median
- keep
whether to keep intermediate results
Value
a data frame with corrected outliers (if fill
is TRUE).
If keep
is TRUE the intermediate columns .<col>_median
,
.<col>_eps2
, .<col>_sigma
and .<col>_outlier
are included.
Details
This approach is inspired by (copied from ;-) the filter
function
in Xavier Olive's Python library
traffic
.
See also
Other analysis:
cumulative_distance()
,
cumulative_time()
,
extract_segment()
,
smooth_positions()
Examples
if (FALSE) {
library(readr)
library(dplyr)
library(anytime)
library(trrrj)
library(ggplot2)
ifile <- system.file("extdata", "belevingsvlucht.csv", package = "trrrj")
df <- readr::read_csv(ifile) %>%
mutate(timestamp = anytime::anytime(time, tz = "UTC"))
df1 <- df %>%
filter_outlier(col = baroaltitude, ksize = 17, fill = TRUE, keep = FALSE)
ggplot() +
geom_line(data = df, mapping = aes(x = timestamp, y = baroaltitude), colour = "blue") +
geom_line(data = df1, mapping = aes(x = timestamp, y = baroaltitude), colour = "red")
}