Event Detection in Eye Tracking and Fixation Time Analysis. Demonstration of the kollaR package

2025-05-07

The majority of eye tracking analyses rely on accurate detection of fixations and/or saccades using event detection algoritms (also known as fixation filters). These measures are often automatically exported from software provided by the eye tracker manufacturer. However, pre-processing and choice of event detection algorithms can have a large impact on the results, especially if there is some level of noise in the data. This is typically the case in research about children or individuals with neurodevelopmental conditions such as autism or intellectual disability.

kollaR is a library for eye tracking analysis and visualizations specifically designed to facilitate compare and evaluate event detection algorithms. It includes functions for pre-processing, event detection and visualizations for comparing the results of different data processing pipelines.

Once an event detection algorithm has been selected, kollaR can be used to create static and dynamic visualizations of gaze behavior for presentation and dissemination and conduct analyses based on areas of interest (AOIs).

This demonstration illustrates pre-processing, event detection and comparisons between different event detection algorithms.

These demonstrations use the kollaR (version 1.1.0), ggplot2 and dplyr libraries.

Installation

kollaR is available on CRAN. Install it by writing the following line of code in the R prompt:

install.packages(“kollaR”)

Pre-processing


We will start working with example data from one participant. The data come from a preferential looking task. Participants view two images presented to the left and right on a monitor. The images are the paintings Angelus Novus and Senecio by Paul Klee (1879-1940). Data were recorded using a Tobii Pro Spectrum at 1200 HZ. The screen resolution is 1920 x 1080 pixels. Let’s look at data from one participant.

You can get an overview by typing:

#Show the first parts of the data
head(sample.data.unprocessed)

The data contains timestamps and unprocessed x and y coordinates. There are several missing samples at the onset of the recording.

Let’s do some pre-processing of the data. The kollaR function process_gaze has parameters determining 1) the maximum length of gaps in the data to be interpolated over in ms, and 2) the size of the moving average filter used to smooth the X and Y coordinates (in ms).

gaze_processed <- preprocess_gaze(sample.data.unprocessed, max_gap_ms = 75, filter_ms = 15)

#Plot the X coordinates of the processed data


What effects do these settings have? Take a look at the plot. It can be useful to plot the unprocessed and processed data together. We will use the kollaR function filt_plot_temporal to do this. The function process_gaze will save the unprocessed x and y variables under the names x.unprocessed and y.unprocessed. Let’s plot the unprocessed and processed x coordinates during a 5000 milliseconds interval which starts at 1000 ms and ends at 6000 ms in the recording (Note that “raw” here indicates that these are data before fixation and saccade detection, although other pre-processing may have been done)


fixation_plot_ts(data = gaze_processed, var1 = "x.unprocessed", var2 ="x.raw", plot.window = c(1000, 6000))


The blue line (data after pre-processing) seems to follow the gray line (data before pre-processing) closely. Note that the pre-processing procedure removes some samples that are found right before and after gaps in the data. Several sudden shifts in gaze position can be seen, probably reflecting saccades. We’ll change the settings and run the code again. This time, we increase the maximum length of gaps in the data accepted for interpolation and the length of the smoothing window. This time, we will plot data from both pre-processing procedures in the same plot for comparison. To do this, we will use the kollaR function fixation_plot_ts. We will name the data resulting from the first pre-processing procedure “Strict”, filter the data again with a setting we will call “Wide” and plot them together.


#Tag the pre-processed data
gaze_processed$filter <- "Strict"

#Pre-process data again.
gaze_processed.wide <- preprocess_gaze(sample.data.unprocessed, max_gap_ms = 200, filter_ms = 150)
gaze_processed.wide$filter <- "Wide"

gaze.processed.combined <- rbind(gaze_processed, gaze_processed.wide)

#Plot the X coordinates of both time series. 
plot <- fixation_plot_ts(gaze.processed.combined, plot.window = c(1000, 6000), var1 = "x.unprocessed",var2 ="x.raw", algorithm.name = "filter")


A higher threshold for interpolation leads fewer gaps in the data. Due to the longer smoothing window, some rapid changes in x either disappear or look like slower, continuous movements of the gaze position. This would likely result in a smaller number of detected fixations and saccades.

Fixation Detection


Now, let’s apply an event detection algorithm to identify fixations in the data. We will use kollaR to apply three event detection algorithms: the I-VT algorithm, the I-DT algorithm, and the identification by two-means clustering (I2MC) algorithm. All algorithms can be adapted by changing the parameters of the functions. By default, fixations that appear very closely in time and space are merged. Set the input values distance.threshold and merge.ms.threshold to 0 to skip this.


I-VT classification algorithm

ivt <- algorithm_ivt(gaze_processed,velocity.threshold = 35, min.fixation.duration = 40, one_degree = 40, distance.threshold = 0.7, merge.ms.threshold = 75, min.saccade.duration = 12)


Adaptive velocity threshold algorithm

Run a an adaptive velocity threshold algorithm algorithm to detect fixations. Fixations must have a minimum duration of 40 milliseconds.

  adaptive <- algorithm_adaptive(gaze_processed, min.fixation.duration = 40, min.saccade.duration = 10, xcol = "x.raw", ycol = "y.raw",  merge.ms.threshold = 75, distance.threshold = 0.7, one_degree = 40) 


Dispersion-based threshold algorithm

Run a dispersion-based (ID-T) algorithm to detect fixations. Fixations must have a minimum duration of 40 milliseconds. All samples counted as belonging to a fixations should be within a 1 degree radius of the centroid of the fixation.

  dispersion <- algorithm_idt(gaze_processed, dispersion.threshold = 1, min.duration = 40, one_degree = 40, distance.threshold = 0.7, merge.ms.threshold = 75)


Identification by two-means clustering algorithm

Run a two-means clustering algorithms to detect fixations

  c2m <- algorithm_i2mc(gaze_processed, windowlength.ms = 200, distance.threshold = 0.7, min.fixation.duration = 40, weight.threshold = 2, downsampling.factors = c(10, 20), merge.ms.threshold = 75)

Each function returns a list which includes fixations, raw and fixation samples, and saccades (only the I-VT algorithm). To look at the fixations detected by the I-VT algorithm, run the code below. The data frame includes coordinates, duration, onset and offset (relative to the first sample in the input data).

The output includes two data quality metrics, the proportion of missing samples during the period classified as a fixation, and RMSD. RMSD is the sample-to-sample root mean square deviations (RMSD) during the fixation period. Higher RMSD values indicate lower precision (more noise) in the data.

head(ivt[["fixations"]])



Visualizing the Results of Fixation Detection Algorithms


To understand what the fixation classification algorithm does to your data, it’s helpful to visualize fixation data and raw samples together. This illustrates which periods the algorithm identifies as fixations and which not. We can plot the raw and fixation x coordinates against sample (that is, time) using the function fixation_plot_temporal(). The parameter ‘plot.window’ determines the onset and offset of the interval you want to plot in ms. By default, we plot the X coordinates.

Here, we plot data from the I-VT algorithm

new_plot <- fixation_plot_temporal(ivt[["filt.gaze"]], plot.window = c(1000, 3000), var1 = "x.raw", var2 = "x")


As a comparison, we will plot the same interval in the data using the I-DT (dispersion-based) algorithm.

new_plot <- fixation_plot_temporal(dispersion[["filt.gaze"]], plot.window = c(1000, 3000), var1 = "x.raw", var2= "x")


Create a plot of the fixations and raw samples in 2D space instead. Here, we will use the kollaR function fixation_plot_2d. This function uses two data frames: fixation.data is here a data frame with fixations from one or more fixation detection algorithm, and raw.data is a sample-by-sample data frame with x and y coordinates before fixation detection. It is possible to change the radius of the plotted fixations. Here, we set it to 40 pixels which is approximately 1 degree of the visual field on the screen where stimuli were presented.

fixation_plot_2d(fixation.data = ivt[["fixations"]], raw.data = ivt[["filt.gaze"]], plot.window = c(1000, 8000), fixation.radius = 40)



Visualizing the Results of Multiple Fixation Detection Algorithms


Now, we will compare the output from three different fixation detection algorithms. The first step is to add all detected fixations in a combined data frame. The second step is to visualize them with the kollaR function fixation_plot_2d. This function needs a fixation data frame and sample-by-sample raw data (in this case included in the data frame ivt[[“filt.gaze]]). We will shrink the X axis to make the gaze data more visible.

combined.fix = rbind(
             ivt[["fixations"]], adaptive[["fixations"]], c2m[["fixations"]])

new_plot <- fixation_plot_2d(raw.data = ivt[["filt.gaze"]], 
           fixation.data = combined.fix,
          plot.window = c(3000, 8000), xres = 1600, yres = 1080, xmin = 300, fixation.radius = 40, order.vertical = TRUE)


It’s reassuring that all three algorithms give similar results. What happens if we change the settings of the I-VT algorithm? The following example runs algorithm_ivt() with two extreme (and most likely invalid) settings. A velocity threshold of 10 degrees will identify relatively small accelerations in eye movement speed as saccades. In contrast, a velocity threshold of 75 degrees is conservative and may miss slow saccades.


  ivt_high <- algorithm_ivt(gaze_processed,velocity.threshold = 75, min.fixation.duration = 40, one_degree = 40)

  ivt_low <- algorithm_ivt(gaze_processed,velocity.threshold = 10, min.fixation.duration = 40, one_degree = 40)
  combined.fix = rbind(ivt_high[["fixations"]], ivt_low[["fixations"]])

  new_plot <- fixation_plot_2d(raw.data = ivt[["filt.gaze"]], 
           fixation.data = combined.fix, plot.window = c(1000, 3000), xres = 1920, yres = 1080, fixation.radius = 40, order.vertical = TRUE)


This time, we see some clear differences. As expected, the lower threshold identifies a higher number of fixations. Let’s have a look at the duration of the fixations detected by the I-VT algorithm and the I-DT algorithm. We will create a data frame that includes fixations detected by both algorithms, and then plot some summary statistics with the kollaR function plot_algorithm_results.



fixations.to.plot <- rbind(ivt[["fixations"]], dispersion[["fixations"]])
plot_algorithm_results(fixations.to.plot, "duration")




Data from multiple participants


Let’s look at data from multiple participants and compare the two fixation filters. Here, we load a file with fixations detected using default settings from the I-VT and I-DT algorithms.

We will plot the the RMSD values from threee algorithms. It is not uncommon that fixation classification that work well for certain groups of participants (for example, neurotypical adults) do not work well for others (for example, young autistic children). Here, we will use the output from the functions algorithm_ivt(), algorithm_i2mc and algorithm_idt() and plot sample-to-sample RMSD for seven participants using the function plot_algorithm_results(). We will use example data included in kollaR.

fixations.to.plot <- filter(sample.data.fixations, fixation.algorithm %in% c("ivt", "idt", "i2mc"))
  plot_algorithm_results(fixations.to.plot)


The I-DT and I2MC algorithms seem to include periods with more noise (higher RMSD) in their classified fixations than the I-VT algorithm. Does this mean that they are better? Or worse? That depends on your research question and data quality. In studies with young children with neurodevelopmental conditions, noisy is the rule rather than the exception, and it is very helful to have an algorithm that is able to detect fixations at all. In other other research contexts where noise is less of a problem, there may be reasons to exclude data periods with more noise rather than trying to include them in the analysis.

There are some extreme outliers in the data. You may want to inspect those fixations closer to examine your algorithms.



Visualizing Fixations



Static Plots

After deciding on an event classification algorithm, it’s useful to visualize the data. First, let’s create a plot of the scanpath for one participant using the function static_plot()

The kollaR function ‘static_plot’ shows fixations within the interval plot.onset - plot.offset and correspond to values in the variable “onset”. Fixations must have the variables x, y, onset and offset. Here, we will plot data for the participant “id3” and include only fixations detected with the I-VT algorithm (as implemented in the function algorithm_ivt).


ivt.fixations <- dplyr::filter(sample.data.fixations, fixation.algorithm=="ivt")
data.by.id <- dplyr::filter(sample.data.fixations, id == "id3")
static <- static_plot(data.by.id, plot.onset = 1, plot.offset = 8000, background.images = NA)


Let’s plot data from all participants. Does it look reasonable?

static <- static_plot(ivt.fixations, plot.onset = 1, plot.offset = 8000, background.images = NA)


The plot will be easier to understand if we can plot the fixations on top of the images participants were watching. We can do this in static_plot() by adding a data frame with background images. This data frame must include image names an full file paths if they are not located in R:s working directory as well as the position of the images.

This type of plot is useful for presenting results, but also for validating our analysis methods. For example, if the plot indicates that participants did not look at the actual stimuli but at the side of them, there is a high chance that something went wrong in the analysis.


background.images <- data.frame(path = c("klee2.jpg", "angelus.jpg"), 
                                min.x = c(34, 1186),
                                min.y = c(190, 190),
                                max.x = c(734, 1886),
                                max.y = c(890, 890))


static <- static_plot(ivt.fixations, plot.onset = 1, plot.offset = 8000, background.images = background.images)



Dynamic Plots


Eye tracking data can be visualized in dynamic plots which show how gaze behaviors unfold in real time. The kollaR function animated.fixation.plot() creates a .gif animation of the eye movements of one or multiple participants. As in static_plot(), it is possible to add background images and select an interval to visualize. The resulting animation can be stored in a file and used in presentations. If background images are included, their onset and offset should be specified in the data frame.


background.images <- data.frame(path = c("klee2.jpg", "angelus.jpg"), 
                                min.x = c(34, 1186),
                                min.y = c(190, 190),
                                max.x = c(734, 1886),
                                max.y = c(890, 890),
                                onset = c(1, 1),
                                offset = c(8000, 8000))


animated_fixation_plot(ivt.fixations, plot.onset = 2000, plot.offset = 8000, background.images = background.images, save.gif = FALSE, n.loops = 0, resolution.scaling = 0.5, gif.dpi = 150, show.progress = FALSE)




Areas of Interest (AOIs)


Once we have visualized the data and validated the fixation detection algorithm, we will want to know whether fixations fall into the areas or interest (AOI) we define in our study. This can be done using the kollaR function aoi_test().

aoi_test() needs two data frames: one for the gaze data and one for the AOIs. The gaze data are fixations in this example, but the function can be used to analyze saccadic latency as well. Here, we define two AOIs called “left” and “right”:


aois <- data.frame(
  name =c("left", "right"),
  x0 = c(34, 1186),
  y0 = c(190, 190),
  x1 = c(734, 1886),
  y1 = c(890, 890),
  type = c("rect", "rect"))


Here, we assumed that both AOIs are rectangular. Change the variable ‘type’ to “circle” to test whether fixations fall into a circular/elliptical AOI. The kollaR function draw_aois() can be used to draw rectangular or circular AOIs on a stimulus image.

Summarize fixation time and number of fixations in each AOI with the function aoi_test. Start with one participant and see the results:


data.by.id <- dplyr::filter(ivt.fixations, id =="id4")
data <- aoi_test(data.by.id, aois)
head(data)


The function will give you the total fixation time in the AOI, the number of detected fixations and the latency to the first fixation detected in the AOI. Send subsets of the fixation data frame to the function if you are interested in specific intervals of the data (for example, one trial.)

Run the analysis for all participants and combine them in a data frame called dataset


dataset <- data.frame()
for (this.id in unique(ivt.fixations$id)) {
  dat <- aoi_test(dplyr::filter(ivt.fixations, id == this.id), aois)
  dat$id <- this.id
  dataset <- rbind(dataset, dat)
  
}
head(dataset)


Plot fixation time as a function of AOI. What do the data look like? Any outliers?


ggplot(data= dataset, aes(x = aoi, y = total.duration, fill = aoi))+ geom_violin()+geom_boxplot(width = 0.2)+geom_jitter(width = 0.3)+theme_minimal()


It can be useful to determine whether fixations were outside rather than inside a specific AOI. If aoi_test is run with the parameter outside = TRUE, the output will summarize the number of fixations, total duration, and latency to a first fixation outside each AOI.

dataset <- data.frame()
for (this.id in unique(ivt.fixations$id)) {
  dat <- aoi_test(dplyr::filter(ivt.fixations, id == this.id), aois, outside = TRUE)
  dat$id <- this.id
  dataset <- rbind(dataset, dat)
  
}
head(dataset)




Saccades


Saccades are the rapid eye movements occuring in between fixations. They have several properties which are interesting in eye tracking research. The kollaR function algorithm_ivt() detects saccades and returns their onset, offset, amplitude and peak velocity. Here, we will plot the distribution of saccade amplitude in the kollaR example data.


ggplot(data = sample.data.saccades, aes(x = amplitude,))+geom_histogram(bins = 15, fill = "darkred", color = "darkgrey")


While most saccades are shorter than 10 degrees, there is also a small number of very long saccades. Are these valid? A careful analysis of the pre-processing and event detection algorithms settings can help us understand whether these are truly saccades or could be artifacts in the data.

Let’s compare saccade amplitude between the I-VT algorithm at different threshold values (75, 30, and 10 degrees) and the adaptive velocity threshold algorithm which sets a threshold based on characteristics of the data. This time, we will only plot saccades with an amplitude shorter than 10 degrees.


ivt[["saccades"]]$algorithm <- "ivt"
ivt_high[["saccades"]]$algorithm <- "ivt_high"
ivt_low[["saccades"]]$algorithm <- "ivt_low"
adaptive[["saccades"]]$algorithm <- "adaptive"

#Remove some unneccessary fields
adaptive[["saccades"]] <- dplyr::select(adaptive[["saccades"]], -firstline, -lastline)

combined.saccades <- rbind(ivt[["saccades"]],
                           ivt_high[["saccades"]],
                           ivt_low[["saccades"]],
                           adaptive[["saccades"]]
                           )

ggplot(data = dplyr::filter(combined.saccades, amplitude<10), aes (x = algorithm, y = amplitude, fill = algorithm)) + geom_violin(alpha = 0.5) + geom_boxplot(width =0.5)+ geom_jitter() + theme_minimal()


There are clear differences in both the number and the amplitude of the detected saccades. The I-VT algorithm with a very low threshold (10 degrees) detects many short saccades. In contrast, the I-VT algorithm with a threshold set to 75 degrees detects only a small number with large amplitude. The medium threshold I-VT and the adaptive threshold algorithms seem to identify the same number of saccades with a similar mean amplitude.

Why can this be the case?

Velocity profiles

Run the I-VT algorithm again. This time, we will save the velocity profiles:


ivt <- algorithm_ivt(gaze_processed,velocity.threshold = 35, min.fixation.duration = 40, one_degree = 40, save.velocity.profiles = TRUE)


Saccades have a characteristic profile of increase and decrease in eye movement velocity. Peak velocity is associated with arousal, and as such of interest to many researchers. Use the kollaR function plot.velocity.profiles() to plot saccade velocity.
Plot the velocity profiles of all detected saccades with a higher amplitude than 2 degrees:


velocity_plot <- plot_velocity_profiles(filter(sample.data.saccades, amplitude >2))