Course list

By some estimates, 90% of the data that has ever existed has been created in the last two years. This is a staggering figure and has given rise to new challenges and opportunities in almost every industry: what kind of data do you need to collect to compete, and how can you make sense of it once you have collected it? As technology evolves and the volume of data increases, how can you make the best use of all this information? How can you use the data to help drive your decision-making? How can you make data work for you? How can you ensure your data accurately reflects the population in which you're interested?

In this course, you will determine the types of engineering and business questions you can answer, the kinds of problems you can solve, and the decisions you can make, all through using data analytics. You will explore best practices for collecting information so that you can make informed predictions, develop insights, and better inform organizational decision-making. You will see real-world examples that demonstrate how those tools work. Additionally, you will have a chance to apply some of the concepts to your own work. You will explore best practices for sampling and examine how different types of sampling are each suited for different situations. Finally, you will see real-world examples that demonstrate how those tools work and have a chance to practice sampling techniques in some case study scenarios.

  • Apr 22, 2026
  • Jul 1, 2026
  • Sep 9, 2026
  • Nov 18, 2026
  • Jan 27, 2027
  • Apr 7, 2027
  • Jun 16, 2027

Visualization is one of the most simple and effective ways to find patterns in data. These patterns include: What is the general range and shape of the data set? Are there any clusters of observations? Which variables correlate with each other? Are there any obvious outliers?

As your data set grows in terms of the number of data points and variables, however, it becomes increasingly difficult to visualize all this information at once. At most, you can plot data points on a three-dimensional axis and add further distinctions of size, color, shape, and so on. Yet this can easily become too busy and difficult to read. How, then, do we find patterns in really big data sets?

In this course, you will explore several powerful and commonly utilized techniques for distilling patterns from data. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following course or have equivalent experience before taking this course:

  • Understanding Data Analytics
  • May 6, 2026
  • Jul 15, 2026
  • Sep 23, 2026
  • Dec 2, 2026
  • Feb 10, 2027
  • Apr 21, 2027
  • Jun 30, 2027

When you have large groups of objects, it is often helpful to split them into meaningful groups or clusters. One example of this would be to identify different types of customers so that a company can more efficiently route their calls to a helpline. As a second example, suppose an automobile manufacturer wanted to segment their market to target the ads more carefully. One approach might be to take a database of recent car sales, including the social demographics associated with each customer, and segment the population purchasing each type of automobile into meaningful groups.

Specialized approaches exist if your data contains information that relates to time and geography. You can use this additional information to identify geographical and temporal hotspots. Hotspots are regions of high activity or a high value of a particular variable. These results can help you focus your attention on a particular region where a problem is occurring more than usual, such as the incidence of asthma in a large city. In both cluster and hotspot analysis, the results can help you discover new and interesting features, problems, and red flags regarding the data being analyzed.

In this course, you will explore several powerful and commonly utilized techniques for performing both cluster and hotspot analysis. You will implement these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible and applicable to your work.

You are required to have completed the following courses or have equivalent experience before taking this course:

  • Understanding Data Analytics
  • Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
  • May 20, 2026
  • Jul 29, 2026
  • Oct 7, 2026
  • Dec 16, 2026
  • Feb 24, 2027
  • May 5, 2027

A story can play an important role in understanding data. It can help distill complex information into something manageable- something we can think about easily, relate to, and use to make decisions. For many problems that we encounter globally, however, a story that describes what already happened is not enough precision for the job we want to perform. Often, we would like to use available data to make numerically accurate predictions about what might happen in the future. This task requires the construction of mathematical models that are well suited to our real-world problems.

In this course, you will explore several types of statistical models used with data to make predictions. These models bring with them a whole batch of important concerns, such as estimation and validation, that make the entire process into both an art and a science. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

  • Understanding Data Analytics
  • Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
  • Finding Patterns in Data Using Cluster and Hotspot Analysis
  • Jun 3, 2026
  • Aug 12, 2026
  • Oct 21, 2026
  • Dec 30, 2026
  • Mar 10, 2027
  • May 19, 2027

Supervised learning is a general term for any machine learning technique that attempts to discover the relationship between a data set and some associated labels for prediction. In regression, the labels are continuous numbers. This course will focus on classification, where the labels are taken from a finite set of numbers or characters. The prototypical and perhaps most well-known example of classification is image recognition. The goal is to take an image (represented by its pixel values) and determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?

There are many practical classification tasks, such as determining whether an individual's financial history makes them high risk for a loan, whether there is a defect in a material based on some sensor readings, or whether a new email is spam or not. These problems share the same basic form and can be solved with many different types of mathematical, statistical, and probabilistic models developed by the machine learning community.

In this course, you will explore several powerful and commonly utilized techniques for supervised learning. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

  • Understanding Data Analytics
  • Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
  • Finding Patterns in Data Using Cluster and Hotspot Analysis
  • Regression Analysis and Discrete Choice Models
  • Jun 17, 2026
  • Aug 26, 2026
  • Nov 4, 2026
  • Jan 13, 2027
  • Mar 24, 2027
  • Jun 2, 2027

Neural networks, a nonlinear supervised learning modeling tool, have become hugely popular within the last two decades because they have been successfully applied to a wide range of problems, including automatic language processing, image classification, object detection, speech recognition, and pattern recognition. They are mathematical models that are loosely built up based on an analogy to the interconnected neuron in the brain. They take in a vector or matrix of input data and output either a classification value or an approximation to a functional value. The beauty is that the relationships between the inputs and outputs can be highly non-linear and complex.

In this course, you will explore the mechanics of neural networks and the intricacies involved in fitting them to data for prediction. Using packages in the free and open-source statistical programming language R with real-world data sets, you will implement these techniques. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

  • Understanding Data Analytics
  • Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
  • Finding Patterns in Data Using Cluster and Hotspot Analysis
  • Regression Analysis and Discrete Choice Models
  • Supervised Learning Techniques
  • Apr 22, 2026
  • Jul 1, 2026
  • Sep 9, 2026
  • Nov 18, 2026
  • Jan 27, 2027
  • Apr 7, 2027
  • Jun 16, 2027

eCornell Online Workshops are live, interactive 3-hour learning experiences led by Cornell faculty experts. These premium short-format sessions focus on AI topics and are designed for busy professionals who want to gain immediately applicable skills and strategic perspectives. Workshops include faculty presentations, breakout discussions, and guided hands-on practice.

The AI Workshops All-Access Pass provides you with unlimited participation for 6 months from your date of purchase. Whether you choose to attend one workshop per month, or several per week, the All-Access Pass will allow you to customize your AI journey and stay on top of the latest AI trends.

Workshops cover a range of cutting-edge AI topics applicable across industries, hosted by Cornell faculty at the forefront of their fields. Whether you are just getting started with AI, seeking to build your AI skillset, or exploring advanced applications of AI, Workshops will provide you with an action-oriented learning experience for immediate application in your career. Sample Workshops include:

  • Work Smarter with AI Agents: Individual and Team Effectiveness
  • Leading AI Transformation: Bigger Than You Imagine, Harder Than You Expect
  • Using AI at Work: Practical Choices and Better Results
  • Search & Discoverability in the Era of AI
  • Don't Just Prompt AI - Govern it
  • AI-Powered Product Manager
  • Leverage AI and Human Connection to Lead through Uncertainty

Request
more Info
by completing the form below.

Act today—courses are filling fast.

How It Works

I like to think outside of the box, and this program from eCornell helped me conceptualize how I want to approach data problems going forward. I was able to actually apply new course concepts to my work, rather than simply repeat steps with different values.
‐ Mark T.
Mark T.

Frequently Asked Questions

Data has never been more available, but the professionals who stand out are the ones who can translate messy inputs into clear, evidence-based decisions. Cornell’s Data Science Certificate helps you build that end-to-end capability, combining statistical thinking with practical machine learning so you can move confidently from data collection to analysis, modeling, and interpretation.

In this certificate program, authored by faculty from Cornell’s Duffield College of Engineering, you will work through a structured analytics workflow using R and real-world datasets, building skill in topics like sampling and bias, pattern discovery with unsupervised methods, predictive modeling with regression and classification, and modern neural network approaches. Along the way, you will practice validating models, diagnosing common issues, and explaining results so your work can actually be used in business, engineering, research, or policy settings.

The Data Science Certificate experience is designed to be hands-on and supported, with expert facilitation, interactive discussions, and project-based assignments that push you to apply methods to realistic scenarios instead of just watching videos.

If you want practical R-based data science skills, a structured path from data to decision, and expert-guided project feedback that helps you apply what you learn at work, you should choose Cornell’s Data Science Certificate.

Many online programs stop at content delivery. Cornell’s Data Science Certificate is built to help you practice doing data science in a way that holds up in real professional settings, with structure, accountability, and feedback.

You learn in a small cohort (typically up to 35 professionals) where discussion and peer insight are part of the experience, and an expert facilitator supports your progress with guidance and graded, project-based feedback. Instead of treating analytics as isolated topics, you build capability across a connected workflow: collecting and evaluating data quality, discovering patterns with unsupervised methods, building predictive models (including regression and classification), and working with neural networks and deep learning concepts in R.

Cornell’s Data Science Certificate is also unusually tool-rich for an online certificate. You will implement methods in R on real datasets, and you’ll encounter practical elements like model validation, cross-validation, parameter tuning, and interpretability techniques (including using LIME to understand what is influencing a model’s predictions).

Enrolling in this certificate also provides you with a 6-month All-Access Pass to eCornell's live online AI Workshops, interactive sessions led by world-class Cornell faculty that combine Ivy League insight with practical applications for busy professionals. Each 3-hour Workshop features structured instruction, guided practice, and real tools to build competitive AI capabilities, plus the opportunity to connect with a global cohort of growth-oriented peers. While AI Workshops are not required, they enhance certificate programs through:

  • Integrating AI perspectives across most curricula
  • Responding to emerging AI developments and trends
  • Offering direct engagement with Cornell faculty at the forefront of AI research

The best fit for Cornell’s Data Science Certificate is a working professional who wants a rigorous, applied way to strengthen data-driven decision making using modern analytics and machine learning.

The Data Science Certificate is designed for:

  • Current and aspiring data scientists and analysts
  • Engineers, researchers, and technical managers who work with quantitative data
  • Professionals who want to move beyond descriptive reporting into modeling, prediction, and interpretable machine learning

To be ready to succeed, you should come in with experience in at least one programming language, a foundation in basic probability and statistics, and comfort with college-level calculus. Those prerequisites make it easier to move quickly into hands-on modeling and coding in R.

Project work in Cornell’s Data Science Certificate is designed to mirror the kind of analysis you would be expected to do in practice, using real datasets and a repeatable workflow in R. You will build models, evaluate them, and learn to explain what your results mean for decisions, not just what the code outputs.

Examples of projects completed by past learners include:

  • Building an H2O deep learning model to predict home prices, applying dropout and L1/L2 regularization to control overfitting, and using LIME and variable importance to explain what drives each prediction
  • Training and evaluating a neural network that predicts county-level election vote share from demographic and socioeconomic inputs, using normalization, train-test splits, and MAE/R-squared to quantify performance
  • Running a county-level hotspot analysis of vaccination rates by joining shapefiles to public health data, computing Getis-Ord local G statistics, and mapping statistically significant hot and cold clusters
  • Clustering U.S. metro areas using the Texas Transportation Institute congestion dataset, comparing k-means with hierarchical methods, and interpreting clusters as distinct tiers of congestion and travel reliability
  • Identifying and interpreting extreme outliers in multivariate city congestion patterns by inspecting distance matrices and dendrogram structure, then refining clustering choices to improve interpretability

Across Cornell’s Data Science Certificate, these assignments help you build a portfolio of concrete analytical outputs and strengthen how you communicate findings to stakeholders who need clear, defensible recommendations.

Cornell’s Data Science Certificate equips you to turn data into credible, explainable analysis that supports better decisions in your role.

After completing the Data Science Certificate, you will have the skills to:

  • Explore the data analytics process and examine the tools available to improve decision making
  • Use unsupervised learning techniques to help identify patterns in data and create visualizations to better spot those patterns
  • Categorize data using supervised learning algorithms
  • Predict the value of continuous variables with linear regression
  • Use neural networks to make predictions about new data
  • Make forecasts from data collected over time and measure their accuracy

In student feedback, learners consistently describe gaining practical confidence with R and real analytics workflows, strengthening their ability to interpret results and explain what findings mean for decision making. They also highlight building usable skills with regression modeling (including interaction effects and nonlinear terms), working with a broad set of methods like clustering, association rules, PCA, factor analysis, and deep learning concepts, and completing applied projects that feel directly connected to workplace problem solving. Many note that the structure and deadlines fit busy schedules while still feeling rigorous, and that responsive facilitators and detailed feedback helped them improve the quality of their work.

What truly sets eCornell apart is how our programs unlock genuine career transformation. Learners earn promotions to senior positions, enjoy meaningful salary growth, build valuable professional networks, and navigate successful career transitions.

Cornell’s Data Science Certificate, which consists of 6 short courses, is designed to be completed in 4 months. Each course runs for 2 weeks, with a typical weekly time commitment of 8 to 10 hours.

In practice, the schedule is designed to work alongside full-time responsibilities. You complete most learning activities asynchronously, including short videos, readings, coding exercises, discussions, and graded project work, with clear deadlines to keep you on track.

Because the program is facilitated, you can ask questions, get feedback on your work, and stay connected to a small cohort while still managing your learning time around your own calendar.

Students in Cornell’s Data Science Certificate consistently describe a hands-on, tool-rich learning experience that helps them turn data into actionable insights at work. They often point to the program’s strong mix of real-world datasets, applied projects, and practical statistical and machine learning methods that build confidence not just in running analyses, but in interpreting results and explaining what they mean for decision making.

Learners frequently highlight outcomes such as:

  • Building skill with R and common analytics workflows in a structured environment
  • Applying regression modeling techniques, including interaction effects and nonlinear terms, to real business questions
  • Exploring a broad toolkit of methods, such as clustering, association rules, PCA, factor analysis, and deep learning concepts
  • Strengthening the ability to move from data collection and evaluation to evidence-based recommendations
  • Gaining experience with course projects that mirror workplace problem solving

Beyond the technical curriculum, students commonly mention that the online format is designed for working professionals, with clear module organization, digestible lessons, and deadlines that keep them accountable while remaining compatible with demanding schedules. Many also emphasize the value of responsive facilitators, detailed feedback on assignments, and a learning experience that feels rigorous, current, and immediately relevant to their careers.

You don’t need advanced R skills to be confident entering Cornell’s Data Science Certificate program, but you should already be comfortable programming in at least one language, with the ability to write and troubleshoot basic code, work with data, use functions, and follow core programming logic. Prior R experience is helpful but not required if you have a solid coding foundation and you’re ready to apply those skills in R.

To be successful in your coursework, you should also have:

  • Knowledge of basic probability and statistics
  • Comfort with college-level calculus

Meeting these expectations allows you to focus on analytics rather than foundational math or coding. You will be able to quickly move into hands-on work like sampling and bias considerations, building and validating regression and classification models, and experimenting with neural network approaches in R.

Hands-on coding is a central part of Cornell’s Data Science Certificate. You will use R to implement core analytics and machine learning techniques, and you’ll spend time running code, interpreting outputs, and troubleshooting issues that come up in real analysis.

Along the way, you will work with practical tooling that supports modern workflows, including:

  • R and RStudio-style coding environments for analysis and visualization
  • Packages and approaches for model evaluation and validation
  • Deep learning workflows in R using H2O, including cross-validation and hyperparameter tuning
  • Model interpretability techniques such as LIME to understand which inputs are influencing predictions

You will also encounter how structured data is stored and queried in relational databases, including the role SQL queries play in turning stored data into usable information, while keeping the program’s main focus on doing data science in R.

Machine learning is woven throughout Cornell’s Data Science Certificate, moving from foundational modeling to more advanced approaches so you understand both how models work and when to use them.

You will study supervised learning methods for classification, learn how to build and evaluate predictive models, and then go deeper into neural networks, including practical topics like activation functions, parameter optimization, cross-validation, hyperparameter tuning, and ways to interpret black-box predictions.

The goal of Cornell’s Data Science Certificate program is not to treat neural networks as magic but to help you evaluate their performance and explain results in a way that supports real decisions.