WORK IN PROGRESS
This package contains convenience functions for doing shape analyses of 2D and 3D face images in
R (see also here), as well as tutorials for some common analyses: calculating sexual dimorphism, facial similarity and averageness, and asymmetry.
The following tutorial will provide a brief guide to loading 2D face data into
R, Procrustes-aligning landmark templates and conducting a principal component analysis of shape using the package
geomorph. It will conclude by introducing the function
facefuns::facefuns, which will automatically conduct these steps and create outputs for use in subsequent analyses.
Steps for importing your delineated landmark templates into
R will vary depending on the software you used to delineate and the
R package(s) you want to use.
For this tutorial, we will work with the CC-BY licensed Face Research London Set. This is a set of 102 faces that were delineated with the standard Face Research Lab (FRL) template of 189 points.
We will use
facefuns::read_lmdata to read the template files.
path_to_tem <- system.file("extdata", "tem", package="facefuns") data <- read_lmdata(lmdata = path_to_tem)
As you can see,
read_lmdata will by default plot the loaded data. Use
plot = FALSE to turn off plotting.
You might not want to retain all of the standard template’s 189 landmarks for shape analyses. For this example, we will remove the following landmarks:
We can define which landmarks to exclude by creating a vector that contains the number of every landmark to be deleted. Note that Webmorph starts counting at 0 - you will have to add +1 to each landmark!
If deleting multiple landmarks, it’s easier to save them all into a vector; for the current example, this would look like:
remove_points <- c(45:50, 100:104, 116:125, 146:158, 159:164, 165:170, 171:174, 175:179, 184:185)
Let’s read in the template files again, but this time we will remove the points we decided to eliminate:
data <- read_lmdata(lmdata = path_to_tem, remove_points = remove_points)
The black dots show the sample average, the gray points represent individual faces. As you can see, there is a lot of “noise” in the individual data - some of this variation will be due to
differences in the size of delineated faces, e.g., some peoples’ faces might just be genuinely bigger, but it could also they were slightly closer to the camera;
their position on screen, e.g., some people’s faces might have been a bit more in the center of the captured image than others;
and their rotation, e.g., some people might have been tilting their face slightly to the left or right
We are not interested in that sort of variation, but seek to eliminate it - Generalized Procrustes Analysis does exactly that!
geomporph::gpagen to Procrustes-align our data. Check out
help(gpagen) for details and plenty of references!
data_gpa <- gpagen(data, print.progress = FALSE)
The aligned coordinates are stored in
data_gpa$coords. Let’s plot one of the aligned faces.
plot(data_gpa$coords[, , 1], asp = 1)
The GPA will sometimes rotate landmark templates, but this can be easily fixed by rotating them back into the right position.
Better. Let’s have another look at the whole sample again.
Much less messy! What we are left with after the GPA is actual shape data - data that is free of variation due to differences in size, location and orientation between individual subjects.
You might want to explore which shape dimensions explain the most variance in your data. Or, you might just be keen to get out principal component scores for each face for use in subsequent analyses.
geomorph function for conducting a shape PCA is
pca_output <- gm.prcomp(data_aligned) # summary(pca_output)
summary(pca_output) will return a list of all principal components. That is usually not what we want. Instead, we will use
facefuns::select_pcs to decide which PCs we would like to retain for subsequent analyses.
london_pcs <- select_pcs(pca_output)
select_pcs returns a list of principal components as chosen by one of three methods. It defaults to the most conservative “broken_stick” method. Other methods are “Kaiser_Guttman” (retains PCs with eigenvalues greater than the mean eigenvalue) and “total_variance” (retains PCs explaining at least total_var variance). It returns three items: the number of selected PCs (
london_pcs$n), their importance (
london_pcs$selected) and the criterion that was used to choose PCs (
london_pcs$method). The London Set has 8 selected PCs; the first three PCs explain 36.2%, 11.1% and 6.7% of variance, respectively.
london_pcs$selected #> SD Variance Cum Var #> PC1 0.046 0.362 0.362 #> PC2 0.026 0.111 0.474 #> PC3 0.020 0.067 0.541 #> PC4 0.018 0.058 0.599 #> PC5 0.018 0.056 0.655 #> PC6 0.017 0.050 0.705 #> PC7 0.014 0.033 0.739 #> PC8 0.013 0.027 0.766
The principal component scores for each face are stored in
pca_output$x. To save them for any subsequent analyses you might want to do, we could use the following code:
LondonSet_scores <- pca_output$x %>% as_tibble() %>% # tidy colnames rename_with(~gsub("Comp", "PC", .x)) %>% # re-add IDs tibble::add_column(.before = 1, id = gsub("^ID=", "", dimnames(data)[])) %>% tibble::column_to_rownames(var = "id") %>% dplyr::select(1:(london_pcs$n + 1))
What do the shape PCs for the London Set look like? You can use
geomorph::plotRefToTarget function to visualize differences, e.g., between the average face and the sample min and max values on individual PCs.
We will first define the reference face (here, the sample average) and then plot the difference to the sample min and max on PC1.
facefuns is a convenience function for doing all of the above in one step.
shapedata <- facefuns(data = data, pc_criterion = "broken_stick", plot_sample = TRUE, auto_rotate = TRUE, quiet = FALSE)
#> All landmarks were used in analyses. #> The loaded data set contains 102 specimen, delineated with 132 2-D landmarks. #> The broken_stick criterion was used to select 8 principal components. #> Templates were rotated using "rotateC" after the GPA.
facefuns returns a list with the following elements:
array: Procrustes-aligned data as three-dimensional array
matrix: Procrustes-aligned data as two-dimensional matrix [MAKE OPTIONAL?]
average: sample average
selected: contains elements
n: number of selected PCs
method: criterion used to select PCs
pc_scores: principal component scores
pc_plot: PCs for plotting
summary: List of key descriptives
facefuns will create all selected PCs at plus and minus 3 standard deviations. These can again be plotted using
plotRefToTarget or using
plot_2dpcs. Note that while
plotRefToTarget is based on the sample range of a specific PC,
plot_2dpcs is based on units of standard deviation (see also
plot_2dpcs(input = shapedata$pc_plot, ref = shapedata$average, which_pcs = 1:2)