Data Science - eCornell

Overview and Courses

From data to decision, R is quickly becoming one of the most popular and effective programming languages of data science.

In this program, you’ll apply data science tools to the collection of data and the translation of data into information, constructing models that can be used to address the questions that you're investigating. You’ll have the opportunity to apply data analytics as a four-part process: gathering data, looking for patterns in that data, finding insights in any patterns you discover, and using those insights to make decisions. This process does not make decisions for you, but it will help you to better understand the effects of the decisions you might make. Through an examination of real-world data sets and different modeling techniques, as well as an in-depth look at how the programming language R can be used to help you find patterns and derive insights, you will gain valuable experience working in each stage of the data analytics process, helping you and your organization to make better decisions – and gain a sound scientific understanding of why you're making the choices you're making.

In order to be successful in this program, you will need to have proficiency in R programming, prerequisite knowledge in basic probability and statistics concepts, and college-level calculus.

The courses in this certificate program are required to be completed in the order that they appear.

Understanding Data Analytics

By some estimates, 90% of the data that has ever existed has been created in the last two years. This is a staggering figure and has given rise to new challenges and opportunities in almost every industry: what kind of data do you need to collect to compete, and how can you make sense of it once you have collected it? As technology evolves and the volume of data increases, how can you make the best use of all this information? How can you use the data to help drive your decision-making? How can you make data work for you? How can you ensure your data accurately reflects the population in which you're interested?

In this course, you will determine the types of engineering and business questions you can answer, the kinds of problems you can solve, and the decisions you can make, all through using data analytics. You will explore best practices for collecting information so that you can make informed predictions, develop insights, and better inform organizational decision-making. You will see real-world examples that demonstrate how those tools work. Additionally, you will have a chance to apply some of the concepts to your own work. You will explore best practices for sampling and examine how different types of sampling are each suited for different situations. Finally, you will see real-world examples that demonstrate how those tools work and have a chance to practice sampling techniques in some case study scenarios.

View Course Details

Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis

Visualization is one of the most simple and effective ways to find patterns in data. These patterns include: What is the general range and shape of the data set? Are there any clusters of observations? Which variables correlate with each other? Are there any obvious outliers?

As your data set grows in terms of the number of data points and variables, however, it becomes increasingly difficult to visualize all this information at once. At most, you can plot data points on a three-dimensional axis and add further distinctions of size, color, shape, and so on. Yet this can easily become too busy and difficult to read. How, then, do we find patterns in really big data sets?

In this course, you will explore several powerful and commonly utilized techniques for distilling patterns from data. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following course or have equivalent experience before taking this course:

Understanding Data Analytics

View Course Details

Finding Patterns in Data Using Cluster and Hotspot Analysis

When you have large groups of objects, it is often helpful to split them into meaningful groups or clusters. One example of this would be to identify different types of customers so that a company can more efficiently route their calls to a helpline. As a second example, suppose an automobile manufacturer wanted to segment their market to target the ads more carefully. One approach might be to take a database of recent car sales, including the social demographics associated with each customer, and segment the population purchasing each type of automobile into meaningful groups.

Specialized approaches exist if your data contains information that relates to time and geography. You can use this additional information to identify geographical and temporal hotspots. Hotspots are regions of high activity or a high value of a particular variable. These results can help you focus your attention on a particular region where a problem is occurring more than usual, such as the incidence of asthma in a large city. In both cluster and hotspot analysis, the results can help you discover new and interesting features, problems, and red flags regarding the data being analyzed.

In this course, you will explore several powerful and commonly utilized techniques for performing both cluster and hotspot analysis. You will implement these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible and applicable to your work.

You are required to have completed the following courses or have equivalent experience before taking this course:

Understanding Data Analytics
Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis

View Course Details

Regression Analysis and Discrete Choice Models

A story can play an important role in understanding data. It can help distill complex information into something manageable- something we can think about easily, relate to, and use to make decisions. For many problems that we encounter globally, however, a story that describes what already happened is not enough precision for the job we want to perform. Often, we would like to use available data to make numerically accurate predictions about what might happen in the future. This task requires the construction of mathematical models that are well suited to our real-world problems.

In this course, you will explore several types of statistical models used with data to make predictions. These models bring with them a whole batch of important concerns, such as estimation and validation, that make the entire process into both an art and a science. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

Understanding Data Analytics
Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
Finding Patterns in Data Using Cluster and Hotspot Analysis

View Course Details

Supervised Learning Techniques

Supervised learning is a general term for any machine learning technique that attempts to discover the relationship between a data set and some associated labels for prediction. In regression, the labels are continuous numbers. This course will focus on classification, where the labels are taken from a finite set of numbers or characters. The prototypical and perhaps most well-known example of classification is image recognition. The goal is to take an image (represented by its pixel values) and determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?

There are many practical classification tasks, such as determining whether an individual's financial history makes them high risk for a loan, whether there is a defect in a material based on some sensor readings, or whether a new email is spam or not. These problems share the same basic form and can be solved with many different types of mathematical, statistical, and probabilistic models developed by the machine learning community.

In this course, you will explore several powerful and commonly utilized techniques for supervised learning. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

Understanding Data Analytics
Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
Finding Patterns in Data Using Cluster and Hotspot Analysis
Regression Analysis and Discrete Choice Models

View Course Details

Neural Networks and Machine Learning

Neural networks, a nonlinear supervised learning modeling tool, have become hugely popular within the last two decades because they have been successfully applied to a wide range of problems, including automatic language processing, image classification, object detection, speech recognition, and pattern recognition. They are mathematical models that are loosely built up based on an analogy to the interconnected neuron in the brain. They take in a vector or matrix of input data and output either a classification value or an approximation to a functional value. The beauty is that the relationships between the inputs and outputs can be highly non-linear and complex.

In this course, you will explore the mechanics of neural networks and the intricacies involved in fitting them to data for prediction. Using packages in the free and open-source statistical programming language R with real-world data sets, you will implement these techniques. The focus will be on making these methods accessible for you in your own work.

You are required to have completed the following courses or have equivalent experience before taking this course:

Understanding Data Analytics
Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
Finding Patterns in Data Using Cluster and Hotspot Analysis
Regression Analysis and Discrete Choice Models
Supervised Learning Techniques

View Course Details

How It Works

Format

All Online

Time Commitment

4 months with 8-10 hours of study per week

Cost

$3,600

Learn From Top Minds

Courses are developed by Cornell faculty.

Power Your Career

Gain today’s most in-demand skills to stand apart.

Flexibility Fits Your Life

Learn on your schedule without stepping out of your job.

Small-class Experience

Participate in facilitated discussions and live sessions with industry peers.

Real-world Projects

Apply learnings and insights to your work to make an impact right away.

Personalized Feedback

Enjoy meaningful feedback on assignments from expert facilitators.

Format

All Online

Time Commitment

4 months with 8-10 hours of study per week

Cost

$3,600

Learn From Top Minds

Courses are developed by Cornell faculty.

Power Your Career

Gain today’s most in-demand skills to stand apart.

Flexibility Fits Your Life

Learn on your schedule without stepping out of your job.

Small-class Experience

Participate in facilitated discussions and live sessions with industry peers.

Real-world Projects

Apply learnings and insights to your work to make an impact right away.

Personalized Feedback

Enjoy meaningful feedback on assignments from expert facilitators.

View slide #1
View slide #2
View slide #3
View slide #4
View slide #5
View slide #6
View slide #7
View slide #8
View slide #9

Faculty Author

view details hide details

Linda Nozick

Director of Civil and Environmental Engineering

Cornell's College of Engineering

Bio
Certificates Authored

Director of Civil and Environmental Engineering, College of Engineering

Linda Nozick is Professor and Director of Civil and Environmental Engineering at Cornell University. She is co-founder and a past director of the College Program in Systems Engineering and has been the recipient of several awards, including a CAREER award from the National Science Foundation and a Presidential Early Career Award for Scientists and Engineers from President Clinton for “the development of innovative solutions to problems associated with the transportation of hazardous waste.” Dr. Nozick has authored over 60 peer-reviewed publications, many focused on transportation, the movement of hazardous materials, and the modeling of critical infrastructure systems. She has been an associate editor for Naval Research Logistics and a member of the editorial board of Transportation Research Part A. Dr. Nozick has served on two National Academy Committees to advise the U.S. Department of Energy on renewal of their infrastructure. During the 1998-1999 academic year, she was a Visiting Associate Professor in the Operations Research Department at the Naval Postgraduate School in Monterey, California. Dr. Nozick holds a B.S. in Systems Analysis and Engineering from the George Washington University and an MSE and Ph.D. in Systems Engineering from the University of Pennsylvania.

Key Course Takeaways

Explore the data analytics process and examine the tools available to improve decision making
Use unsupervised learning techniques to help identify patterns in data and create visualizations to better spot those patterns
Categorize data using supervised learning algorithms
Predict the value of continuous variables with linear regression
Use neural networks to make predictions about new data
Make forecasts from data collected over time and measure their accuracy

Discover More

Download a Brochure

Not ready to enroll but want to learn more? Download the certificate brochure to review program details.

Download Now

What You'll Earn

Data Science Certificate from Cornell College of Engineering
120 Professional Development Hours (12 CEUs)

Start Now

Watch the Video

Hear eCornell students share their stories.

Discover More

Who Should Enroll

Current and aspiring data scientists
Analysts
Engineers
Researchers
Technical managers

“I went into this data science program not fully knowing what to expect other than the content aligned with the skills I needed for my career. I was blown away by the thought put into the curriculum, the projects, and the help of expert facilitators. I’ve used other online resources to learn these skills, but until this program, I hadn’t walked away with both the technical skills AND an understanding of how to apply them in real-world scenarios.”

Michael T.

Data Science Student

Request Information Now by completing the form below.

Act today—courses are filling fast.

Do you wish to communicate with our team by text message?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Select Payment Method	Cost
Determine Your Own Course Schedule	$3,600
Learn and Pay as You Go

Address:	950 Danby Rd.
	Suite 150
	Ithaca, NY 14850

DataScienceCornell Certificate Program

Overview and Courses

Course list

Understanding Data Analytics

Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis

Finding Patterns in Data Using Cluster and Hotspot Analysis

Regression Analysis and Discrete Choice Models

Supervised Learning Techniques

Neural Networks and Machine Learning

How It Works

Faculty Author

Key Course Takeaways

Download a Brochure

What You'll Earn

Watch the Video

Who Should Enroll

Request Information Now by completing the form below.

Data
ScienceCornell
Certificate Program