The story of a remarkable scientific tool that uses big data sets to examine cultural trends in human history.
In this debut, Aiden (Genetics/Baylor Coll. of Medicine) and Michel, founder of data science company Quantified Labs, describe research with big data that led to their teaming up with Google to develop the Ngram Viewer, an online tool that searches more than 30 million digitized books to reveal how words and phrases have been used over time. Launched in 2010 as part of Google Books, the viewer’s search of ngrams (letter combinations) serves the needs of lexicographers and historians while providing endless diversion for others. Calling Google’s digitized data “an unprecedented précis of humanity’s cultural record,” the authors show how such data can be made to reveal important changes over time, from when the early expression “the United States are” gave way to “the United States is” to how censorship can cause the sudden disappearance of particular words and phrases, such as “Tiananmen Square.” Having met at Harvard, the authors began seven years ago to experiment with their new scope on historical trends to learn how English grammar changes, how people get famous, and how societies learn and forget. While recounting the copyright, privacy and other issues they faced in developing their tool, they offer fascinating insights into how dictionaries work, the half-lives of irregular verbs and the most famous people of the last two centuries (Hitler heads the list). In an appendix, some two dozen charts graph the relative frequency of use of certain words, such as “London” and “New York,” since 1800. (New York began its ascendancy in 1911.) The authors also consider the moral issues raised by the prospect of a future in which personal, digital and historical records reveal more and more about human experience.
A fun, revealing exploration of a new way to view the past.