Clustering Heterogeneous Semi-Structured Social Science Datasets

D. B. Skillicorn, Christian Leuprecht

    Research output: Contribution to conferencePaperpeer-review


    Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides a new path for understanding such datasets, which we illustrate with three real-world examples: the Global Terrorism Database, details of every terrorist attack since 1970; a Chicago police dataset, details of every drug-related incident over a period of approximately a month; and a dataset describing members of a Hezbollah crime/terror network within the U.S.

    Original languageEnglish
    Number of pages5
    Publication statusPublished - 1 Jan 2015
    Event15th Annual International Conference on Computational Science -
    Duration: 1 Jun 2015 → …


    Conference15th Annual International Conference on Computational Science
    Period1/06/15 → …


    • Chicago policing
    • Clustering
    • Crime
    • Global Terrorism Database
    • Hashing
    • Hezbollah
    • Terrorism


    Dive into the research topics of 'Clustering Heterogeneous Semi-Structured Social Science Datasets'. Together they form a unique fingerprint.

    Cite this