Casper Albers

Research Interests


Bayesian Longitudinal Models and their use in the social sciences

(The text below is based on that of the successful NWO grant applications 406-11-018 and 406-13-006, both (co)written by me.)

Πάντα ῥεῖ (panta rhei, "everything flows") is a famous aphorism by Plato representing the thoughts of the Greek philosopher Heraclitus. Everything changes constantly and nothing stays the same. In order to properly understand the processes behind for instance human behaviour, a static observation at one single time point is inadequate. Also in classical `repeated measures contexts’, where measurements usually have been taken at 3 to 6 points in time, the latent process generating the behaviour cannot be studied in sufficient detail.

The use of research designs with intensive measurements across time for individual subjects is becoming increasingly popular in psychological research. Such designs are necessary to achieve insight into the extremely complex phenomena of human behaviour. This complexity finds expression in behaviour fluctuating across time. Since those fluctuations depend on contextual and interindividual differences, understanding the underlying dynamics is extremely challenging.

Advances in modern technology have caused a big rise in the automatic collection of vast quantities of data over time, on a frequent basis. Whether it concerns daily sales figures of products in all supermarkets in a country, minute-by-minute personal interaction data from social networks, real time information of traffic densities on a motorway system or hourly updates of energy consumption of all clients of an energy provider, datasets with "big data" have become more and more abundant. These sets call for new methods to analyse them: classical statistical methods are inadequate or, at best, suboptimal to this end.

My main aim of this research line is to develop novel statistical techniques, that are extremely suitable for the analysis of such data, both from a statistical as from a computational point of view. These new methods will offer essential insight into the dynamic processes underlying the data; insights that cannot be obtained using currently available methods. The basis of these methods lies in the Bayesian dynamic model, the linear multiregression dynamic model (LMDM) in particular.

Although the merits of the principles underlying time series analysis have been shown convincingly in psychology, the static models used so far suffer from important limitations. In a static model, the underlying parameters are fixed in time, rendering the model unsuitable for highly unpredictable and dynamic data. Important patterns can remain hidden in the analysis: wrong conclusions might be inferred from the model. Dynamic models do not have this limitation.

In a Bayesian dynamic model the number of variables and parameters are allowed to change over time. The LMDM is a graphical dynamic model that breaks the multivariate time series model into separate simple (conditional) univariate dynamic model components. The univariate models are relatively simple and computationally fast, irrespective of the size and complexity of the original multivariate time series. The model is robust to changes in conditions as well as changes in the structure of the graphical representation of the data.

I have worked extensively with the LMDM, proving its usability for analysing time series containing vehicle counts in traffic networks (Anacleto et al., 2013ab; Queen et al., 2007, 2009) and sales figures (Queen et al., 2008). There are, however, many other areas that could benefit from using the LMDM, including those more related to psychology and two of Groningen's key research priorities: energy and sustainable society. The LMDM has several favourable properties that make it eminently suitable for these analyses: it is a multivariate model by design, dynamic and flexible, maintaining forecast performance through times of sudden chance. The model can easily be represented graphically, making it accessible to those without statistical expertise. Furthermore, the calculations are quick (essential for big data) and it can handle missing data by design.


Multivariate Data Analysis

A plethora of problems in psychometrics - from canonical analysis (Albers and Gower, 2013; Gower and Albers, 2011) to Procrustes analysis (Albers and Gower, 2010) - focuses around the same mathematical backbone: the minimisation of a positive (semi-)definite quadratic function under a quadratic constraint (Albers et al., 2011ab). This forms the basis of a wide range of mathematicalstatistical applications. I am planning on extending this constrained quadratic optimisation problem to a general constrained convex optimisation problem. This will vastly increase the number of possibilities, but also induce (mathematical) challenges. Furthermore, I am studying the applications of this construct to interpreting and visualising interactions in multivariate data sets (Gower et al., in preparation; Albers and Gower, in preparation).


Statistical analysis of study success and study progress

I am involved in statistical analyses of the implementation of the Dutch academic dismissal policy Bindend Studieadvies in Groningen. First year students that do not achieve at least a certain amount of credit points, are expelled from  further study at the faculty. I carried out a statistical study analysis what percentage of "good students" would be (unfairly) expelled, and what percentage of "bad students" would be (unfairly) allowed to continue for given levels of BSA-threshold (Albers et al., in preparation).