Mass identifications

Online material for the book "Mass identifications"

Authors: Daniel Kling, Thore Egeland, Andreas Tillmar and Lourdes Prieto

Publisher: Elsevier

Order your copy here

Updated: 2021-04-27

Background

In a large number of applications, there is a need to identify individuals. For instance, the identity of a genotyped dead individual can be established if DNA is available from known relatives. The title embodies the scope of the book and encompasses the initial example, namely identification problems using DNA. However, "Mass" typically indicates large scale cases involving many deceased persons. This includes, but is not limited to, disaster victim identification (DVI), mass graves and family reunification as well as other scenarios where the aim is to connect unidentified persons to some reference data.

The second part of the title constrains the contents of book. A number of important topics like sample collection and laboratory analysis will not be addressed in detail. Rather, focus is on matching, searching and statistical evaluation. Throughout, the freely available software Familias as well as the recently released R library dvir, will be used to exemplify using applications.

This book aims to fill the gaps where previous publications lack or are difficult to find for practitioners. A comprehensive book summarizing the state of art as well as providing extensions of methods is sought for. In particular, recent development in genetics (sequencing) will be addressed. We will focus on the practical application and implementation: how does one approach the problem to identify individuals using DNA and statistically summarise the evidence? Previous publications have focused on topics such as collection of samples, different technical questions about DNA analyses (e.g. DNA extraction efficiency in difficult biological samples) or descriptions about particular cases. Less information is available about how to perform mass comparisons of genetic data and the identification process from a statistical point of view.

Courses

Please see http://familias.name/courses.html for a comprehensive list of courses (previous and future) given by the authors.

Software

The book recommend using dedicated software (or tools) that enable validated likelihood ratio calculations. We provide details on the freely available Familias software in addition to some useful R libraries. The former software is widely used by forensic laboratories, in particular for paternity/kinship testing. It can deal with extended pedigrees using autosomal, unlinked, markers (SNPs or STRs), incorporating complications such as mutations, population substructure and more. Familias has extended functionality connected to mass identification, amongst others, blind search feature, conditional simulation/evaluations of reference families as well as screening (pairwise AM to PM searches) capabilities. We further outline the useful R library dvir, coded by Thore Egeland. The library leverage the forrel and pedtools packages by developer Magnus Vigeland to calculate likelihood ratios in situations involving multiple missing persons where a so called global solution is sought for.

Familias installs through a downloadable .exe file on all Windows based systems (also virtual). Instructions for Mac and other users are available following links from the Familias website
Installation of the R package dvir is outlined at https://github.com/thoree/dvir. You may have to use the install.packages(c("arrangements", "pedtools", "forrel")) command first to install all dependent libraries. If GitHub is unavailable, you use install.packages("https://familias.name/BookKETP/Files/dvir_2.0.zip", repos=NULL, type = "win.binary")

Exercises

Below follows links to exercises and solutions contained in the book. Solutions are subject to update if errors are detected, please see version history. All files can be downloaded as a single zip file.

Chapter 3

In Chapter 3 we focus on the step prior to the actual search/matching. The first objective is to merge PM samples coming from the same individual. Next, close relatives among the victims need to be determined. This procedure - identifying relationships between pairs of DNA profiles - is called blind search. This work is typically the initial step of the matching and some cases may already be solved after this stage. The blind search typically involves many comparisons or tests. This increases the risk of errors and therefore the general problem of multiple testing is addressed. To evaluate the blind search and prepare for the next steps involving the reference families, we describe simulation methods which allow us to assess if the reference families and the genotyped individuals will suffice. More precisely, we address questions like: is the LR likely to reach the prescribed threshold?

Exercises: Available as single pdf
Files: Available as a single zip file (or as individual files)
Solutions: Available as single pdf

Chapter 4

The final part of a mass identification operation is the matching of unidentified remains with missing persons, ultimately providing a list of candidates for further review. This chapter discusses the process and will compare a screening search, i.e. pairwise comparisons, with a complete comparison where all available family members are included. Advantages and drawbacks with the two approaches will be discussed. The chapter will also address subjects such as population frequency databases, prior probabilities, statistical thresholds and posterior probabilities.

Exercises: Available as single pdf
Files: Available as a single zip file (or as individual files)
Solutions: Available as single pdf

Chapter 5

Advances in genetics are rapid - new technologies and possibilities are introduced each year. This chapter explores potential future directions in relation to mass identifications and we discuss how data from emerging sequencing methods can be utilized to improve identification. Approaches to infer phenotypes and biogeographic ancestry from such data will also be discussed. Furthermore, methods for genealogy searches will be outlined and we exemplify how these may potentially aid the forensic investigator in providing leads for positive identification.

Exercises: Available as single pdf
Files: Available as a single zip file (or as individual files)
Solutions: Available as single pdf

Chapter 6

This final chapter will deal with a mock DVI case where all aspect of a massive identification will be involved. Providing case examples is an integral part when educating forensic scientists. The chapter will include a scenario where a data has been generated using simulations. Several challenges will be dealt with, including co-mingled remains, relatives among the victims, complex reference families, inconsistent family data, mismatches between missing persons and their reference families. All data is available at an online repository providing learning material for further education.

Exercises: Available as single pdf
Files: Available as a single zip file (or as individual files)

Please note, solutions to the exercises in Chapter 6 is still in preparation, the file below links to current working version

Solutions: Available as single pdf

You may send comments to daniel.kling@rmv.se