Murdock, Jaimie (Author)
Allen, Colin (Advisor)
Milojevic, Stasa (Advisor)
How do individuals create a knowledge base over a lifetime? Charles Darwin left detailed records of every book he read from The Voyage of the Beagle to just after publication of The Origin of Species. Additionally, he left copies of his drafts before publication. I use these records to build a case study of how reading and writing interact to create conceptual novelties, such as the theory of natural selection and modification by descent. The model is extended to cover entire disciplines by bootstrapping reading and writing histories from bibliographies in scientific publications, scaling the model to address the question of how we move from an individual psychology to society? There are two central components from cognitive science that impact the proposed models. The first is bounded cognition. People have limited attention, and that attention is further limited by an individual’s information processing ability. Information foraging is a framework for managing the trade-off between exploration of new information and exploitation of existing knowledge when searching for information. Most existing work on information foraging and bounded cognition examine short-term information foraging problems, such as formulating web search queries in a laboratory setting with a known information goal. Through the case study of Charles Darwin, we use real-world datasets to explore this problem at a timescale of decades with unknown information goals. The base of the reading model is topic modeling with Latent Dirichlet Allocation (LDA). This method reduces the dimensionality of text by reducing each document to a topic distribution, where each topic is defined as a probability distribution over the words in the collection. With these probability distributions, we are able to apply information theoretic measures to calculate the divergence between texts. These divergences characterize a particular reading decision as exploiting the topics exposed by previously read texts or exploring new topics. I train these topic models not on the records, but identify each volume in the Hathi Trust Digital Library and train the topic model on the full text of the books. While Darwin’s reading notebooks and manuscript drafts provide relatively precise information on reading and writing behaviors at a day-level granularity, that type of data is rare. I explore three extensions of the models, dealing with progressively more “fuzzy” data. First, I look at the contents of Darwin’s Library at the time of his death to infer readings 1860-1882. These readings are used to provide a preliminary analysis of his work on The Descent of Man and the latter editions of the Origin of Species. Then, I look at another historical figure: Thomas Jefferson, whose working library formed the basis of the Library of Congress. We examine the bibliography of his retirement library and tie it into his correspondence to find possible evidence for when certain volumes were read. Finally, I scale the model up to the discipline of neuroscience. I extract citation graphs from the Web of Science to infer reading histories for neuroscientists based on the articles they cited. I use the text of the abstracts of these articles to perform a similar analysis to the Darwin case study on readings and writings. These extensions of the model highlight the potential to work with less precise data and illuminate future problems. Throughout the work, I emphasize the notion of multiple realizability and interpretive pluralism. Each model is itself a population of models, and while simpler term-frequency-based models may show many of the same effects as the topic models, an argument is made for the explanatory power of the topic model with respect to causality.
...More
Article
Grant Ramsey;
Charles H. Pence;
(2016)
evoText: A new tool for analyzing the biological sciences
(/isis/citation/CBB014385965/)
Article
Jo Guldi;
(2022)
The Climate Emergency Demands a New Kind of History: Pragmatic Approaches from Science and Technology Studies, Text Mining, and Affiliated Disciplines
(/isis/citation/CBB144261765/)
Article
Melinda Baldwin;
(2018)
A Perspective from the History of Scientific Journals
(/isis/citation/CBB030888393/)
Article
Theodore M. Porter;
(2018)
Digital Humanism
(/isis/citation/CBB751955814/)
Article
Ivan Flis;
(2018)
Digital Humanities as the Historian’s Trojan Horse: Response to Commentary in the Special Section on Digital History
(/isis/citation/CBB638759086/)
Article
Anu Masso;
Maris Männiste;
Andra Siibak;
(2020)
‘End of Theory’ in the Era of Big Data: Methodological Practices and Challenges in Social Media Studies
(/isis/citation/CBB632756299/)
Article
Abraham Gibson;
Manfred D. Laubichler;
Jane Maienschein;
(2019)
Introduction to Focus: Computational History and Philosophy of Science
(/isis/citation/CBB323182392/)
Thesis
Currier, James David;
(2007)
“Greedy for Facts”: Charles Darwin's Information Needs and Behaviors
(/isis/citation/CBB001560886/)
Thesis
Damerow, Julia;
(2014)
A Quadruple-Based Text Analysis System for History and Philosophy of Science
(/isis/citation/CBB001567603/)
Article
Francesco Beretta;
(2016)
Pour une annotation sémantique des textes: le projet symogih.org et la Text encoding initiative
(/isis/citation/CBB744237198/)
Article
Deryc T. Painter;
Bryan C. Daniels;
Jürgen Jost;
(2019)
Network Analysis for the Digital Humanities: Principles, Problems, Extensions
(/isis/citation/CBB443684783/)
Article
Kenneth D. Aiello;
Michael Simeone;
(2019)
Triangulation of History Using Textual Data
(/isis/citation/CBB253321424/)
Book
Peter Janich;
(2018)
What Is Information?
(/isis/citation/CBB403064080/)
Article
McCarthy, Gavan;
(2011)
Mapping the Past: Building Public Knowledge Places to Meet Community Needs
(/isis/citation/CBB001251178/)
Thesis
Scharf, Sara Tovah;
(2007)
Identification Keys and the Natural Method: The Development of Text-BasedInformation Management Tools in Botany in the Long 18th Century
(/isis/citation/CBB001561510/)
Thesis
Peters, Benjamin;
(2010)
From Cybernetics to Cyber Networks: Norbert Wiener, the Soviet Internet, and the Cold War Dawn of Information Universalism
(/isis/citation/CBB001562760/)
Chapter
Downey, Greg;
(2007)
The librarian and the Univac: automation and labor at the 1962 Seattle World's Fair
(/isis/citation/CBB001180032/)
Book
Antonio Badia;
(2019)
The Information Manifold: Why Computers Can't Solve Algorithmic Bias and Fake News
(/isis/citation/CBB524320511/)
Article
Fosse, Sébastien de la;
(2013)
Media and Cognition: The Relationship between Thought Structures and Media Structures
(/isis/citation/CBB001201747/)
Thesis
Kouper, Inna;
(2011)
The Meanings of (Synthetic) Life: A Study of Science Information as Discourse
(/isis/citation/CBB001567283/)
Be the first to comment!