Earlham College | Data Science
COVID-19 news, plans and updates | READ MORE
Skip to Content

Data Science

Data Science is a relatively new interdisciplinary area of study that combines knowledge and skills from statistics, mathematics, and computer science in novel ways to address a broad range of real-world applications.

Data Science majors are currently in high demand across the board, in industries that include:

  • Technology
  • Banking and finance
  • Entertainment and video gaming
  • Hospitals and pharmaceuticals
  • And many more

Glassdoor's annual report on the "50 Best Jobs in America" has ranked Data Scientist among the top three jobs every year for the past five years (2016-2020).

What does a data scientist do?

A data scientist finds solutions to problems using data from a multitude of different sources. These sources include not only different disciplinary domains and channels, but also a variety of platforms such as cell phones, social media, e-commerce outlets, medical datasets, internet searches, and more. Thus, a data scientist must cultivate skills in all the areas related to working with large, complex datasets, and produce the information necessary for planning, forecasting, and decision-making.

Studying Data Science at Earlham

A student may enter Earlham's Data Science major through one of our introductory statistics, calculus, or computer programming courses. In the first two years of the major, students typically take courses that build a rigorous foundation in the analytical and computational skills needed. Following that, the upper division courses emphasize hands on, project-oriented learning in different application contexts, culminating with the capstone project in the final year.

As a data scientist, you may be asked to:

  • analyze medical data to see whether there could be a correlation between neighborhood, water quality parameters, and public health issues, e.g. cancer (think A Civil Action).
  • analyze voting district maps along with census data and election results to look for signs of gerrymandering.
  • track the movement of city busses in relation to residential and commercial neighborhood traffic patterns.
  • analyze Premier League statistics to assist playing Fantasy Premier League online.

Our Faculty

David Barbella
Assistant Professor of Computer Science

Malik Barrett
Assistant Professor of Mathematics

Fariba Khoshnasib-Zeinabad
Visiting Assistant Professor of Mathematics | Co-Director Institutional Effectiveness

Igor Minevich
Visiting Assistant Professor of Computer Science, Mathematics and Physics

Anand Pardhanani
Associate Professor of Mathematics

Roberta Cayard-Roberts
Administrative Assistant

The Major

The Data Science Major will consist of 13 courses (42 credits) with 12 core courses (39 credits):

  • MATH 120 Elementary Statistics (3 credits)
  • MATH 180 Calculus A (4 credits)
  • MATH 280 Calculus B (optional but strongly recommended) (4 credits)
  • MATH 310 Linear Algebra (3 credits)
  • CS 128 Programming and Problem Solving (4 credits)
  • CS/MATH 190 Math Discovery (2 credits)
  • CS 256 Data Structures (4 credits)
  • CS 310 Algorithms (3 credits)
  • CS 430 Database Systems (3 credits)
  • MATH 300 Mathematical Statistics (3 credits)
  • DS 401 Data Science (3 credits)
  • DS 488 Senior Capstone (3 credits)

and one of the following courses (3 credits):

  • CS 365 Artificial Intelligence and Machine Learning (3 credits) (strongly recommended)
  • CS 290 Computational Modeling/CS340 Scientific Computing (3 credits)
  • CS 345 Software Engineering (3 credits)
  • CS 360 Parallel and Distributed Computing (3 credits)
  • CS 383 Bioinformatics (3 credits)
  • DS 481 Internship (3 credits)
  • MATH 330 Mathematical Modeling (3 credits)
  • BIOL 241 Care and Use of Collections (3 credits)
  • BIOL 410 or ENSU 310 Applications of GIS (3 credits)
  • PSYC 245 Research Design and Statistics (3 credits)
  • ECON 305 Econometrics (3 credits)

Data Science minor: Credit range: 24 credits (28 with credit inflation)
Course range: 7 courses

  1. MATH 120 Elementary Statistics (3 credits)
  2. MATH 180 Calculus A (4 credits)
  3. CS 128 Programming and Problem Solving (4 credits)
  4. CS 256 Data Structures (4 credits)
  5. MATH 300 Mathematical Statistics or MATH 330 Mathematical Modeling (3 credits each)
  6. DS 401 Data Science or CS 430 ¹ Database Systems or CS 365 Artificial Intelligence and Machine Learning (3 credits each)
  7. One of the following courses (each 3 credits):
    • Any additional course from items 5 or 6 above
    • CS 310 Algorithms (required for the CS 430 option in item
    • CS 290 Computational Modeling
    • CS 340 Scientific Computing
    • CS 345 Software Engineering
    • CS 360 Parallel and Distributed Computing
    • CS 383 Bioinformatics
    • DS 481 Internship
    • BIOL 241 Care and Use of Collections
    • BIOL 340 Applied Biostatistics
    • BIOL 410 or ENSU 310 Applications of GIS
    • PSYC 245 Research Design and Statistics
    • ECON 305 Econometrics


* Key

Courses that fulfill
General Education Requirements:

  • (A-AR) = Analytical - Abstract Reasoning
  • (A-QR) = Analytical - Quantitative
  • (D-D) = Diversity - Domestic
  • (D-I) = Diversity - International
  • (D-L) = Diversity - Language
  • (RCH) = Research
  • (W) = Wellness
  • (WI) = Writing Intensive
  • (AY) = Offered in Alternative Year

Topics include exploratory data analysis; measures of central tendency, dispersion and correlation; nonparametric methods; confidence intervals; hypothesis tests; and the design of statistical studies. Also listed as MGMT 120. (A-QR) (WI)

An introduction to computers, computer science and programming with an emphasis on problem analysis and algorithmic solutions. (A-AR, A-QR)

*MATH 180 CALCULUS A (5 credits)
Calculus is the mathematical study of quantities that change with time and of areas and volumes. The development of calculus is one of the great discoveries of humanity, and the resulting discipline is of fundamental importance not only for students of the natural sciences, but also graduate work in the social sciences. Introduces major issues in calculus: functions, limits, derivatives and integrals. Concludes with the fundamental theorem of calculus, which relates areas to rates of change. (A-AR, A-QR)

CS/MATH 190 MATH DISCOVERY (2 credits)
An introduction to the principal topics in mathematics needed by a Computer Science major, and intended for students of computer science. Topics include writing numbers in various bases, set theory, proof by induction, relations and functions, logic, matrices, complex numbers, recursion and recurrences, and rates of growth of various functions.

Natural history, or biological, collections have provided the foundation for the field of biology and the discovery of the processes that underlie the diversity of life on earth. The importance of such collections over time cannot be overstated. Yet formal training in caring for, expanding, and using biological collections is surprisingly lacking. This course aims to introduce students to the wealth of possibilities that exist in biological collections and the practical responsibilities of preserving them. As part of a team, students will gain hands-on practice accessioning, organizing, databasing, communicating with the public about, and conducting research with specimens in the collection. The second half of the course is devoted to research uses of biological collections. Students will read examples in the primary scientific literature of how research using natural history collections has made important contributions to our understanding of the natural world. Because students will read scientific articles that use museum specimens, they should feel comfortable with reading scientific papers and with the content covered in BIOL 111 (i.e. achieved a grade of B or better). Prerequisites: BIOL 111 or instructor approval. Also listed as MUSE 241.

Introduction to experimental design and the analysis of research data in psychology. Topics include methods for observing, measuring and describing behavior. Students will learn to use the statistical software JASP or R in data description and analysis. Offered every semester. (A-QR)

CS 256 DATA STRUCTURES (4 credits)
A systematic introduction to the methodology of problem solving with computers. Emphasizes the design and development process, data abstraction and fundamental data structures, programming for reuse and the development of large programs. Introduces the basic notions of software engineering and analysis of algorithms. Discusses ethical issues in computing. Also listed as CS 256. Prerequisite: CS 128. Co-Requisite: CS/MATH 195.

*MATH 280 CALCULUS B (optional but strongly recommended) (5 credits)
A continuation of MATH 180, including techniques of integration, applications of the definite integral, infinite sequences and series and elementary differential equations. Prerequisite: MATH 180. (A-AR, A-QR)

Designed for students majoring in any of the natural sciences. An introduction to the tools and techniques of interdisciplinary computationally based research in the natural sciences. Computational research uses computers to simulate laboratory experiments or to perform experiments which have no laboratory analog. Lab exercises come from a variety of disciplines. Recommended prerequisites: CS 128 or a lab science course. (AY)

*MATH 300 STATISTICS (3 credits)
Topics include exploratory data analysis; measures of central tendency, dispersion and correlation; nonparametric methods; confidence intervals; inference testing; probability distributions; and the design of statistical studies. Prerequisite: MATH 180.  (A-AR, A-QR)

ECON 305 ECONOMETRICS (3 credits) 
Introduces the basics of econometric analysis. Topics include regression analysis, multicollinearity, heteroskedacity and autocorrelation. Emphasizes the applied aspects of econometrics through the use of standard computer packages.  Prerequisites: ECON 204.

*MATH 310 LINEAR ALGEBRA (3 credits)
Topics include matrices, vector spaces, linear transformations and their applications. Prerequisite: MATH 280.  (A-AR)

CS 310 ALGORITHMS (3 credits)
A study of algorithms and the data structures on which they are based, with a focus on the analysis of their correctness and complexity in terms of running time and space. Prerequisites: MATH 180, MATH 190 and DS 256.


Introduces computer science tools and techniques that support computational science and high performance computing. Computational methods are an integral part of modern science, including multidisciplinary research into climate change, the origins of the universe and the underlying cause of diseases such as Alzheimer's. Topics include scientific libraries and kernels, parallel distributed and grid resources, and the principle software patterns found in this domain. Prerequisites: CS 310 or consent of the instructor. (AY)

The theory, techniques and technologies associated with the design, construction, and testing of software systems, particularly large software systems. Students learn various approaches to procedural decomposition and system architecture. Explores the tools used for building and testing software systems, particularly in the context of open source software. Prerequisite: CS 310. (AY)

The application of parallel programming and problem-solving techniques to solve computationally intensive problems in a variety of disciplines. Parallel computation invites new ways of thinking about problems and is an increasingly important skill in corporate and research environments. Students learn about programming paradigms used in parallel computation, the organization of parallel systems, and the application of programs and systems to solving problems in mathematics, physics, chemistry and other areas. Prerequisite: CS 310. (AY)

This course offers an introduction to topics in Artificial Intelligence and Machine Learning, and covers their theoretical underpinnings while providing opportunities to put various techniques into practice. Topics covered may include search, planning, game-playing and neutral networks and other machine learning approaches. Prerequisite: CS 310 or consent of the instructor.

CS 383 BIOINFORMATICS (4 credits)
Bioinformatics is the application of statistics and computer science to the field of biology. This course is a wide ranging introduction to the field, the tools, and the techniques used to work with large datasets, and will principally concentrate on the analysis and visualization of novel genomic and metagenomic data. The course is centered around doing research and using tools, with much of the course time dedicated to active learning. Prerequisite: BIOL 111, 112, CS 128 or CS 290. Also listed as BIOL 383. (AY)

This course is designed to provide a foundational knowledge of Geographic Information Systems (GIS) and its applications to the Social Sciences. Students in this course will use ArcGIS. The course will cover basic GIS concepts such as mapping, projections, geo-referencing and spatial analysis. It will be taught using a combination of lectures, demonstrations, and hands-on, interactive tutorials in the classroom. Students will constantly apply spatial analytical tools to address questions, solve problems and complete independent projects in and outside the classroom. Prerequisite: Sophomore or Junior standing. Also listed as BIO 410.

CS 430 DATABASE SYSTEMS (3 credits)
An introduction to database management systems. Database design and development are viewed from the perspective of a user, an application program and the database kernel itself. Focuses primarily on relational and object-oriented data models and related software. Prerequisite: CS 256. Co-Requisite: CS 310. (AY)

DS 401 DATA SCIENCE (3 credits) - Topics include the mathematics of linear regression, multilinear regression, logistic regression, time series and PCA and their applications using the Python programming language. Students will be applying these concepts in the context of projects. (RCH)

Requires departmental approval in order to meet the major requirement for 300+ level courses.

Individual and collective investigations into topics of common data science interest not covered in the department's regular course offerings. A significant part of this course is students' reading new data science materials and presenting it to one another.

Earlham College, an independent, residential college, aspires to provide the highest-quality undergraduate education in the liberal arts and sciences, shaped by the distinctive perspectives of the Religious Society of Friends (Quakers).

Earlham College
801 National Road West
Richmond, Indiana
1-765-983-1200 — Main Switchboard
1-800-EARLHAM (327-5426) — Admission


Earlham admits students of any race, color, national and ethnic origin, age, gender and sexual orientation to all the rights, privileges, programs, and activities generally accorded or made available to students at the school. It does not discriminate on the basis of race, color, national and ethnic origin, age, gender and sexual orientation in administration of its educational policies, admissions policies, scholarship and loan programs, and athletic and other school-administered programs.