Sequence analysis in social sciences

Index plot of 10 family life sequences
Index plot of 10 family life sequences

In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories such as family formation, school to work transitions, working careers, but they may also describe daily or weekly time use or represent the evolution of observed or self-reported health, of political behaviors, or the development stages of organizations. Such sequences are chronologically ordered unlike words or DNA sequences for example.

SA is a longitudinal analysis approach that is holistic in the sense that it considers each sequence as a whole. SA is essentially exploratory. Broadly, SA provides a comprehensible overall picture of sets of sequences with the objective of characterizing the structure of the set of sequences, finding the salient characteristics of groups, identifying typical paths, comparing groups, and more generally studying how the sequences are related to covariates such as sex, birth cohort, or social origin.

Introduced in the social sciences in the 80s by Andrew Abbott,[1][2] SA has gained much popularity after the release of dedicated software such as the SQ[3] and SADI[4] addons for Stata and the TraMineR R package[5] with its companions TraMineRextras[6] and WeightedCluster.[7]

Despite some connections, the aims and methods of SA in social sciences strongly differ from those of sequence analysis in bioinformatics.

  1. ^ Abbott, Andrew (1983). "Sequences of Social Events: Concepts and Methods for the Analysis of Order in Social Processes". Historical Methods: A Journal of Quantitative and Interdisciplinary History. 16 (4): 129–147. doi:10.1080/01615440.1983.10594107. ISSN 0161-5440.
  2. ^ Abbott, Andrew; Forrest, John (1986). "Optimal Matching Methods for Historical Sequences". Journal of Interdisciplinary History. 16 (3): 471. doi:10.2307/204500. JSTOR 204500.
  3. ^ Brzinsky-Fay, Christian; Kohler, Ulrich; Luniak, Magdalena (2006). "Sequence Analysis with Stata". The Stata Journal: Promoting Communications on Statistics and Stata. 6 (4): 435–460. doi:10.1177/1536867X0600600401. ISSN 1536-867X. S2CID 15581275.
  4. ^ Halpin, Brendan (2017). "SADI: Sequence Analysis Tools for Stata". The Stata Journal: Promoting Communications on Statistics and Stata. 17 (3): 546–572. doi:10.1177/1536867X1701700302. hdl:10344/3783. ISSN 1536-867X. S2CID 62691156.
  5. ^ Gabadinho, Alexis; Ritschard, Gilbert; Müller, Nicolas S.; Studer, Matthias (2011). "Analyzing and Visualizing State Sequences in R with TraMineR". Journal of Statistical Software. 40 (4). doi:10.18637/jss.v040.i04. ISSN 1548-7660. S2CID 4603927.
  6. ^ Ritschard, Gilbert; Studer, Matthias; Buergin, Reto; Liao, Tim; Gabadinho, Alexis; Fonta, Pierre-Alexandre; Muller, Nicolas; Rousset, Patrick (2021-06-24), TraMineRextras: TraMineR Extension, CRAN, retrieved 2021-09-26
  7. ^ Studer, Matthias (2013). "WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R". LIVES Working Papers. 24. doi:10.12682/lives.2296-1658.2013.24.