July 24–29, 2022
Frankfurt am Main, Germany
Tomas Paus and Hye-Chung Kum, Chairpersons
Program Advisory Committee
Over the past decade, large-scale genomic studies have identified common genetic variations associated with complex traits in health (e.g., educational attainment) and disease (e.g., psychiatric disorders). These genomic studies were made possible through technological and conceptual advancements sparked by the Human Genome Project and by pooling together numerous datasets to increase power (e.g., ~ 1 million individuals in a genetic study of educational attainment). Discoveries from these studies have yielded valuable insights into the molecular pathways that underlie complex traits, yet they can only explain a small amount of inter-individual variability for a given trait. Although some of the “missing” heritability might be carried by rare genetic variants, it is generally accepted that environmental influences contribute a much larger portion of variance.
Environment, however, is difficult to measure on a large scale. Still, the ubiquitous presence of information technology in our lives has created a vast body of digital information and provides a detailed record of many human activities. Harvesting this digital footprint for research purposes lags behind partisan and for-profit use. Some key barriers have been a much lower tolerance for error, higher ethical and legal standards for data use, and high levels of requirements of cyber infrastructure and technical skills. The potential exists for this information to be extracted from multiple sources, so that a rich picture of the human environment can be obtained and related to various phenomena (e.g., brain maturation [Parker et al. 2017], well-being [Kardan et al. 2015], obesity [Maharana and Nsoesie 2018], health [Abnousi et al. 2018], social relationships [David-Barrett et al. 2015]).
How can this potential be realized, especially in the light of differences in conceptualization and methodology across disciplines as well as national differences, governance issues, and ethical considerations? To promote greater understanding, within and between disciplines, and promote future research, the Ernst Strüngmann Forum is convening this transdisciplinary dialogue.
This Forum will explore how digital ethology—the study of human behavior as captured by its digital footprint—can be used to quantify the human environment and facilitate understanding of its impact on health and well-being. The behavior that we seek to understand can be direct (e.g., tweets) or indirect, as inferred by its effect on the physical environment (e.g., broken windows on Google Street). Key concepts will be examined, as will methods needed to quantify the human environment from existing data sources at the aggregate (e.g., neighborhood) level. Requirements for using the resultant information, in conjunction with individual-level data derived from administrative databases, will be explored, as will privacy issues and ethical, legal, and societal implications. Ways of linking aggregate- and individual-level data through geospatial coding will be examined at different levels of spatial granularity. In summary, this Forum aims to:
Group 1: How concepts of ethology can be applied to large-scale digital data
The ethological approach is used to study naturally occurring behavior. In the modern world, such behavior is connected to, and recorded by, a wide array of digital services (e.g., communication and social networking, on-line shopping, information search). How can ethological concepts be applied to help us characterize the modern environment in which humans live? What aspects of the ethological approach can guide us to obtain measures captured directly from digital data generated by our social activities? What kinds of models do we need to understand how human behavior can be inferred from the physical and built environment? The bidirectional nature of these relationships will be explored; namely, how individuals create their environment, and how the environment shapes the individual.
Group 2: Quantifying and geocoding the physical and built environment
Human activities (behaviors) influence the physical environment (e.g., air, green space) and generate the built environment (e.g., roads, sidewalks, stores, service and digital infrastructures); in turn, the physical and built environment influences human behavior (e.g., by imposing barriers). Which sources of information (e.g., aerial and satellite images, Google Street) are relevant to the study of human development in these two domains? This group will explore ways to extract meaningful signals from these sources and to map these signals at different levels of spatial and temporal granularity. It will also suggest models (e.g., predictive, statistical, physical/mathematical) and platforms for sharing tools to facilitate their use.
Group 3: Quantifying and geocoding the social environment
Individuals both create and respond to their social environment through their behavior. Which types of data from heterogeneous digital streams (e.g., Twitter, Facebook, Google search, call detail records, Smartphone locations) are relevant to the study of human environment and, in turn, human development and health across the lifespan? This group will propose ways to extract meaningful signals from these sources (e.g., natural language processing) and to map these signals at different levels of spatial and temporal granularity. It will also explore ways to analyze these measures, using modeling approaches (e.g., predictive, physical/mathematical, statistical) and platforms for sharing tools to facilitate their use.
Group 4: Integrating Knowledge from Individual- and Aggregate-Level Data
National- and local- (e.g., municipality) level administrative data (e.g., health, education, income, housing, civil, social services) systematically and continuously capture information relevant for health and well-being, thus providing an ideal source for research in these domains. Large population databases from these sources efficiently capture individual-level data and have increased over the last decade, albeit differently in various countries. This group will discuss how individual-level data derived from (existing) heterogeneous databases can be brought together with (newly derived) aggregate-level data about the physical, built, and social environments. Strategies for knowledge generation from this linked data will be explored (e.g., data fidelity, efficient workflow, statistical modeling and validation, high dimensionality, interpretation). Issues related to data governance and barriers to data access will be considered, as will the ethical, legal, and societal implications of this line of research.
Parker, N., A. P. Wong, G. Leonard, et al. 2017. Income Inequality, Gene Expression, and Brain Maturation during Adolescence. Sci. Rep. 7:7397
Kardan, O., P. Gozdyra, B. Misic, et al. 2015. Neighborhood Greenspace and Health in a Large Urban Center. Sci. Rep. 5:11610
A. Maharana and E. Okanyene Nsoesie. 2018. Use of Deep Learning to Examine the Association of the Built Environment with Prevalence of Neighborhood Adult Obesity. JAMA Network Open 1:e181535
F. Abnousi, J. S. Rumsfeld, and H. M. Krumholz. 2018. Social Determinants of Health in the Digital Age: Determining the Source Code for Nurture. JAMA doi:10.1001/jama.2018.19763
T. David-Barrett et al. 2016. Communication with Family and Friends across the Life Course. PLoS One 11:e0165687