Machine-Learning for Social Science Lab (MSSL)

MSSL is an institutional division of the Center for the Peace and Security Studies (cPASS).

The Machine Learning for Social Science Lab (MSSL) is dedicated specifically to the intersection of questions from the social sciences and methods from computer science and mathematics. MSSL will serve as a long term institutional home to data collection efforts and methodological tools that otherwise exist as informal, scattered, and temporary collaborations between individual scholars.

More about MSSL

Today, the social science and computer science communities share a wonderful problem. The ever decreasing cost of data collection and processing has made more data available on more subjects than ever before. At the same time, algorithms for making sense of that data have progressed faster than expected, with advances like deep neural networks quickly making machine reading, vision, and inference a reality. Meanwhile, the social sciences are becoming increasingly accepting of large scale computational approaches, but computational social science projects are still relatively isolated. Technically savvy social scientists constantly reinvent the wheel to solve a specific problem in their project and then move on without institutionalizing the capability for the next generation. What is needed is a concerted effort to institutionalize the skills, tools, and tricks we’re already developing in the social sciences and to turn the attention of our computer scientist colleagues to the specific problems we face in our domains.

Machine learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data. MSSL fills an institutional gap of providing instructional, technical, logistical, and financial support to fledgling machine learning projects. Further, it will fill an intellectual gap connecting the new generation of advanced data collection, mining, and machine learning capabilities to specific domains in the social sciences.

The mandate of the Machine Learning for Social Science Lab is to identify and institutionalize recent but mature technologies that could be quickly adapted to domain specific questions in the social sciences. To be concrete, here are just some of the projects already underway: (1) The creation and management of a centralized repository of unstructured text for mining, soon to include hundreds of millions of books, government documents, and news reports; (2) the creation of the Social Science Knowledge Inference Network (SSKI-Net), a domain specific knowledge graph of billions of people, places, and events throughout history; (3) a repository of satellite imagery and digitized military and political maps spanning centuries and geospatial data scraped from them; (4) a hub for social communications data such as twitter and cell phone calls, (5) the Very Wide Cross-National dataset of the Nuclear Age, compiling thousands of measures of countries in one place, (6) a web based graphical user interface for human tagging of events and concepts in unstructured texts, and much more.

Creating, managing, processing, and making available these resources to the community require advanced capabilities not typically found in a social science department. MSSL will handle outreach efforts which are already proving to be problematic under just the heading of Political Science, including negotiating memorandums of understanding for data sharing and collaboration with other labs, recruiting postdocs, graduate students, and undergraduates in other fields, and hiring programmers and other technical experts. It will provide a point of rallying for our own efforts and people in the social sciences for scholars and students whose interests include technological areas but aren’t recognized or served by the core social science methods. Finally, it will pursue grant writing and collaboration so that these capabilities can be expanded and made available over a long time horizon for future scholars.