Thesis ID: CBB117865546

Topic Modeling the Reading and Writing Behavior of Information Foragers (2019)


Murdock, Jaimie (Author)
Allen, Colin (Advisor)
Milojevic, Stasa (Advisor)

Indiana University
Allen, Colin
Milojevic, Stasa
Publication date: 2019
Language: English

Publication Date: 2019
Physical Details: 143 pp.

How do individuals create a knowledge base over a lifetime? Charles Darwin left detailed records of every book he read from The Voyage of the Beagle to just after publication of The Origin of Species. Additionally, he left copies of his drafts before publication. I use these records to build a case study of how reading and writing interact to create conceptual novelties, such as the theory of natural selection and modification by descent. The model is extended to cover entire disciplines by bootstrapping reading and writing histories from bibliographies in scientific publications, scaling the model to address the question of how we move from an individual psychology to society? There are two central components from cognitive science that impact the proposed models. The first is bounded cognition. People have limited attention, and that attention is further limited by an individual’s information processing ability. Information foraging is a framework for managing the trade-off between exploration of new information and exploitation of existing knowledge when searching for information. Most existing work on information foraging and bounded cognition examine short-term information foraging problems, such as formulating web search queries in a laboratory setting with a known information goal. Through the case study of Charles Darwin, we use real-world datasets to explore this problem at a timescale of decades with unknown information goals. The base of the reading model is topic modeling with Latent Dirichlet Allocation (LDA). This method reduces the dimensionality of text by reducing each document to a topic distribution, where each topic is defined as a probability distribution over the words in the collection. With these probability distributions, we are able to apply information theoretic measures to calculate the divergence between texts. These divergences characterize a particular reading decision as exploiting the topics exposed by previously read texts or exploring new topics. I train these topic models not on the records, but identify each volume in the Hathi Trust Digital Library and train the topic model on the full text of the books. While Darwin’s reading notebooks and manuscript drafts provide relatively precise information on reading and writing behaviors at a day-level granularity, that type of data is rare. I explore three extensions of the models, dealing with progressively more “fuzzy” data. First, I look at the contents of Darwin’s Library at the time of his death to infer readings 1860-1882. These readings are used to provide a preliminary analysis of his work on The Descent of Man and the latter editions of the Origin of Species. Then, I look at another historical figure: Thomas Jefferson, whose working library formed the basis of the Library of Congress. We examine the bibliography of his retirement library and tie it into his correspondence to find possible evidence for when certain volumes were read. Finally, I scale the model up to the discipline of neuroscience. I extract citation graphs from the Web of Science to infer reading histories for neuroscientists based on the articles they cited. I use the text of the abstracts of these articles to perform a similar analysis to the Darwin case study on readings and writings. These extensions of the model highlight the potential to work with less precise data and illuminate future problems. Throughout the work, I emphasize the notion of multiple realizability and interpretive pluralism. Each model is itself a population of models, and while simpler term-frequency-based models may show many of the same effects as the topic models, an argument is made for the explanatory power of the topic model with respect to causality.

Citation URI

This citation is part of the Isis database.

Similar Citations

Article Grant Ramsey; Charles H. Pence; (2016)
evoText: A new tool for analyzing the biological sciences (/p/isis/citation/CBB014385965/) unapi

Article Abraham Gibson; Manfred D. Laubichler; Jane Maienschein; (2019)
Introduction to Focus: Computational History and Philosophy of Science (/p/isis/citation/CBB323182392/) unapi

Article Anu Masso; Maris Männiste; Andra Siibak; (2020)
‘End of Theory’ in the Era of Big Data: Methodological Practices and Challenges in Social Media Studies (/p/isis/citation/CBB632756299/) unapi

Article Theodore M. Porter; (2018)
Digital Humanism (/p/isis/citation/CBB751955814/) unapi

Article Melinda Baldwin; (2018)
A Perspective from the History of Scientific Journals (/p/isis/citation/CBB030888393/) unapi

Thesis Currier, James David; (2007)
“Greedy for Facts”: Charles Darwin's Information Needs and Behaviors (/p/isis/citation/CBB001560886/) unapi

Article Deryc T. Painter; Bryan C. Daniels; Jürgen Jost; (2019)
Network Analysis for the Digital Humanities: Principles, Problems, Extensions (/p/isis/citation/CBB443684783/) unapi

Thesis Damerow, Julia; (2014)
A Quadruple-Based Text Analysis System for History and Philosophy of Science (/p/isis/citation/CBB001567603/) unapi

Article Kenneth D. Aiello; Michael Simeone; (2019)
Triangulation of History Using Textual Data (/p/isis/citation/CBB253321424/) unapi

Article Francesco Beretta; (2016)
Pour une annotation sémantique des textes: le projet et la Text encoding initiative (/p/isis/citation/CBB744237198/) unapi

Book Antonio Badia; (2019)
The Information Manifold: Why Computers Can't Solve Algorithmic Bias and Fake News (/p/isis/citation/CBB524320511/) unapi

Article Fosse, Sébastien de la; (2013)
Media and Cognition: The Relationship between Thought Structures and Media Structures (/p/isis/citation/CBB001201747/) unapi

Book Peter Janich; (2018)
What Is Information? (/p/isis/citation/CBB403064080/) unapi

Article McCarthy, Gavan; (2011)
Mapping the Past: Building Public Knowledge Places to Meet Community Needs (/p/isis/citation/CBB001251178/) unapi

Thesis Rowe, Josh; (2011)
The Public Life of Information (/p/isis/citation/CBB001562733/) unapi

Thesis Kouper, Inna; (2011)
The Meanings of (Synthetic) Life: A Study of Science Information as Discourse (/p/isis/citation/CBB001567283/) unapi

Authors & Contributors
Baldwin, Melinda Clare
Beretta, Francesco
Currier, James David
Fosse, Sébastien de la
Guldi, Jo
Janich, Peter
Isis: International Review Devoted to the History of Science and Its Cultural Influences
History of Psychology
Acta Baltica historiae et philosophiae scientiarum
Bruniana & Campanelliana: Ricerche Filosofiche e Materiali Storico-testuali
Circumscribere: International Journal for the History of Science
HOST: Journal of History of Science and Technology
University of Pittsburgh
Princeton University
University of Toronto
Indiana University
The MIT Press
University of Minnesota Press
Digital humanities
Information science
Information theory
Data analysis
Data collection
Text mining
Darwin, Charles Robert
Foucault, Michel
Habermas, Jürgen
Shannon, Claude Elwood
Von Neumann, John
Weaver, Warren
Time Periods
21st century
20th century, late
18th century
19th century
20th century

Be the first to comment!

{{ comment.created_by.username }} on {{ comment.created_on | date:'medium' }}

Log in or register to comment