Calculate similarity of faces • facefuns

WORK IN PROGRESS

This tutorial will explain how to calculate facial (dis)similarity in a set of 2-D faces. We will calculate the Euclidean distance between Procrustes-aligned landmark templates; to calculate it dissimilarity in face space, replace your matrix of landmark coordinates with the matrix of PC scores.

library(facefuns)
library(geomorph)

Read and prepare data

path_to_tem <- system.file("extdata", "tem", package="facefuns")

shapedata <- facefuns(data = read_lmdata(lmdata = path_to_tem,
                                         plot = FALSE),
                      remove_points = "frlgmm",
                      pc_criterion = "broken_stick",
                      plot_sample = FALSE,
                      quiet = TRUE)

Calculate facial similarity

We will use facefuns::calc_ed to calculate similarity. calc_ed requires two arguments:

Landmark coordinates (or PC scores) in a matrix format
A table specifying for which pairs of faces you would like to calculate similarity

Create landmark matrix

Currently, our landmark templates are stored in a three-dimensional array: a list of n matrices of dimensions p x k

p The number of landmarks, here 132
k The number of landmark dimensions, here 2
n The number of faces, here 102

str(shapedata$aligned)
#>  num [1:132, 1:2, 1:102] -0.0526 0.054 -0.0522 -0.0603 -0.0622 ...
#>  - attr(*, "dimnames")=List of 3
#>   ..$ : chr [1:132] "1" "2" "3" "4" ...
#>   ..$ : chr [1:2] "X" "Y"
#>   ..$ : chr [1:102] "001" "002" "003" "004" ...

We will use facefuns::convert_array_to_matrix to convert our array into a matrix with n rows and p x k columns.

data_matrix <- convert_array_to_matrix(shapedata$aligned)
str(data_matrix)
#>  num [1:102, 1:264] -0.0526 -0.0515 -0.0514 -0.0474 -0.0505 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:102] "001" "002" "003" "004" ...
#>   ..$ : chr [1:264] "X1" "Y1" "X2" "Y2" ...

Create list of faces

Most times, you will already have a list of face pairs for which you want to calculate similarity.

For this example, we will calculate the similarity between all possible combinations of face pairs in our data set.

We start by assigning all face IDs in our sample to a variable …

face_names <- dimnames(shapedata$aligned)[[3]]

… and then create a list of all possible combinations

pairs <- expand.grid(A = face_names,
                     B = face_names)

`calc_ed`

We now have everything we need to run our function

sim_table <- calc_ed(coords_matrix = data_matrix,
                     pairs_table = pairs)

head(sim_table)
#> # A tibble: 6 x 3
#>   A     B     EuclideanDistance
#>   <fct> <fct>             <dbl>
#> 1 001   001              0     
#> 2 002   001              0.0857
#> 3 003   001              0.0663
#> 4 004   001              0.129 
#> 5 005   001              0.112 
#> 6 006   001              0.119

Let’s display our data in a wide format and round the values. It is a rather big table, so we will only print a small subset

sim_table %>%
  dplyr::mutate(EuclideanDistance = round(EuclideanDistance, 2)) %>%
  tidyr::spread(B, EuclideanDistance) %>%
  dplyr::select(1:10) %>%
  dplyr::slice(1:9)
#> # A tibble: 9 x 10
#>   A     `001` `002` `003` `004` `005` `006` `007` `008` `009`
#>   <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 001    0     0.09  0.07  0.13  0.11  0.12  0.08  0.14  0.07
#> 2 002    0.09  0     0.07  0.1   0.08  0.08  0.12  0.11  0.11
#> 3 003    0.07  0.07  0     0.1   0.09  0.1   0.09  0.11  0.09
#> 4 004    0.13  0.1   0.1   0     0.09  0.08  0.14  0.11  0.14
#> 5 005    0.11  0.08  0.09  0.09  0     0.1   0.13  0.1   0.12
#> 6 006    0.12  0.08  0.1   0.08  0.1   0     0.14  0.09  0.13
#> 7 007    0.08  0.12  0.09  0.14  0.13  0.14  0     0.16  0.06
#> 8 008    0.14  0.11  0.11  0.11  0.1   0.09  0.16  0     0.15
#> 9 009    0.07  0.11  0.09  0.14  0.12  0.13  0.06  0.15  0

Averageness

Averageness can be quantified as distinctiveness from the sample average. For each face, we will calculate the Euclidean distance to the sample average, and then reverse scores, so higher scores mean “more average”.

You could use calc_ed, but this will require a wee bit of data wrangling: you will need to attach the average template to the array holding the aligned templates, then convert this new array to a matrix and finally reverse the distinctiveness scores. calc_avg does all of that! It only takes one argument - a facefuns object:

avg <- calc_avg(shapedata)

head(avg)
#> # A tibble: 6 x 3
#>   id      dist    avg
#>   <chr>  <dbl>  <dbl>
#> 1 001   0.0879 0.0597
#> 2 002   0.0650 0.0826
#> 3 003   0.0608 0.0867
#> 4 004   0.0701 0.0774
#> 5 005   0.0700 0.0775
#> 6 006   0.0631 0.0844