Linda Nozick is Professor and Director of Civil and Environmental Engineering at Cornell University. She is co-founder and a past director of the College Program in Systems Engineering and has been the recipient of several awards, including a CAREER award from the National Science Foundation and a Presidential Early Career Award for Scientists and Engineers from President Clinton for “the development of innovative solutions to problems associated with the transportation of hazardous waste.” Dr. Nozick has authored over 60 peer-reviewed publications, many focused on transportation, the movement of hazardous materials, and the modeling of critical infrastructure systems. She has been an associate editor for Naval Research Logistics and a member of the editorial board of Transportation Research Part A. Dr. Nozick has served on two National Academy Committees to advise the U.S. Department of Energy on renewal of their infrastructure. During the 1998-1999 academic year, she was a Visiting Associate Professor in the Operations Research Department at the Naval Postgraduate School in Monterey, California. Dr. Nozick holds a B.S. in Systems Analysis and Engineering from the George Washington University and an MSE and Ph.D. in Systems Engineering from the University of Pennsylvania.
Overview and Courses
From data to decision, R is quickly becoming one of the most popular and effective programming languages of data science.
In this program, you’ll apply data science tools to the collection of data and the translation of data into information, constructing models that can be used to address the questions that you're investigating. You’ll have the opportunity to apply data analytics as a four-part process: gathering data, looking for patterns in that data, finding insights in any patterns you discover, and using those insights to make decisions. This process does not make decisions for you, but it will help you to better understand the effects of the decisions you might make. Through an examination of real-world data sets and different modeling techniques, as well as an in-depth look at how the programming language R can be used to help you find patterns and derive insights, you will gain valuable experience working in each stage of the data analytics process, helping you and your organization to make better decisions – and gain a sound scientific understanding of why you're making the choices you're making.
In order to be successful in this program, you will need to have experience in any programming language, prerequisite knowledge in basic probability and statistics concepts, and college-level calculus.
The courses in this certificate program are required to be completed in the order that they appear.
Course list
By some estimates, 90% of the data that has ever existed has been created in the last two years. This is a staggering figure and has given rise to new challenges and opportunities in almost every industry: what kind of data do you need to collect to compete, and how can you make sense of it once you have collected it? As technology evolves and the volume of data increases, how can you make the best use of all this information? How can you use the data to help drive your decision-making? How can you make data work for you? How can you ensure your data accurately reflects the population in which you're interested?
In this course, you will determine the types of engineering and business questions you can answer, the kinds of problems you can solve, and the decisions you can make, all through using data analytics. You will explore best practices for collecting information so that you can make informed predictions, develop insights, and better inform organizational decision-making. You will see real-world examples that demonstrate how those tools work. Additionally, you will have a chance to apply some of the concepts to your own work. You will explore best practices for sampling and examine how different types of sampling are each suited for different situations. Finally, you will see real-world examples that demonstrate how those tools work and have a chance to practice sampling techniques in some case study scenarios.
Visualization is one of the most simple and effective ways to find patterns in data. These patterns include: What is the general range and shape of the data set? Are there any clusters of observations? Which variables correlate with each other? Are there any obvious outliers?
As your data set grows in terms of the number of data points and variables, however, it becomes increasingly difficult to visualize all this information at once. At most, you can plot data points on a three-dimensional axis and add further distinctions of size, color, shape, and so on. Yet this can easily become too busy and difficult to read. How, then, do we find patterns in really big data sets?
In this course, you will explore several powerful and commonly utilized techniques for distilling patterns from data. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following course or have equivalent experience before taking this course:
- Understanding Data Analytics
When you have large groups of objects, it is often helpful to split them into meaningful groups or clusters. One example of this would be to identify different types of customers so that a company can more efficiently route their calls to a helpline. As a second example, suppose an automobile manufacturer wanted to segment their market to target the ads more carefully. One approach might be to take a database of recent car sales, including the social demographics associated with each customer, and segment the population purchasing each type of automobile into meaningful groups.
Specialized approaches exist if your data contains information that relates to time and geography. You can use this additional information to identify geographical and temporal hotspots. Hotspots are regions of high activity or a high value of a particular variable. These results can help you focus your attention on a particular region where a problem is occurring more than usual, such as the incidence of asthma in a large city. In both cluster and hotspot analysis, the results can help you discover new and interesting features, problems, and red flags regarding the data being analyzed.
In this course, you will explore several powerful and commonly utilized techniques for performing both cluster and hotspot analysis. You will implement these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible and applicable to your work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
A story can play an important role in understanding data. It can help distill complex information into something manageable- something we can think about easily, relate to, and use to make decisions. For many problems that we encounter globally, however, a story that describes what already happened is not enough precision for the job we want to perform. Often, we would like to use available data to make numerically accurate predictions about what might happen in the future. This task requires the construction of mathematical models that are well suited to our real-world problems.
In this course, you will explore several types of statistical models used with data to make predictions. These models bring with them a whole batch of important concerns, such as estimation and validation, that make the entire process into both an art and a science. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
- Finding Patterns in Data Using Cluster and Hotspot Analysis
Supervised learning is a general term for any machine learning technique that attempts to discover the relationship between a data set and some associated labels for prediction. In regression, the labels are continuous numbers. This course will focus on classification, where the labels are taken from a finite set of numbers or characters. The prototypical and perhaps most well-known example of classification is image recognition. The goal is to take an image (represented by its pixel values) and determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?
There are many practical classification tasks, such as determining whether an individual's financial history makes them high risk for a loan, whether there is a defect in a material based on some sensor readings, or whether a new email is spam or not. These problems share the same basic form and can be solved with many different types of mathematical, statistical, and probabilistic models developed by the machine learning community.
In this course, you will explore several powerful and commonly utilized techniques for supervised learning. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
- Finding Patterns in Data Using Cluster and Hotspot Analysis
- Regression Analysis and Discrete Choice Models
Neural networks, a nonlinear supervised learning modeling tool, have become hugely popular within the last two decades because they have been successfully applied to a wide range of problems, including automatic language processing, image classification, object detection, speech recognition, and pattern recognition. They are mathematical models that are loosely built up based on an analogy to the interconnected neuron in the brain. They take in a vector or matrix of input data and output either a classification value or an approximation to a functional value. The beauty is that the relationships between the inputs and outputs can be highly non-linear and complex.
In this course, you will explore the mechanics of neural networks and the intricacies involved in fitting them to data for prediction. Using packages in the free and open-source statistical programming language R with real-world data sets, you will implement these techniques. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
- Finding Patterns in Data Using Cluster and Hotspot Analysis
- Regression Analysis and Discrete Choice Models
- Supervised Learning Techniques
How It Works
- View slide #1
- View slide #2
- View slide #3
- View slide #4
- View slide #5
- View slide #6
- View slide #7
- View slide #8
- View slide #9
Faculty Author
Key Course Takeaways
- Explore the data analytics process and examine the tools available to improve decision making
- Use unsupervised learning techniques to help identify patterns in data and create visualizations to better spot those patterns
- Categorize data using supervised learning algorithms
- Predict the value of continuous variables with linear regression
- Use neural networks to make predictions about new data
- Make forecasts from data collected over time and measure their accuracy
Download a Brochure
Not ready to enroll but want to learn more? Download the certificate brochure to review program details.What You'll Earn
- Data Science Certificate from Cornell College of Engineering
- 120 Professional Development Hours (12 CEUs)
Watch the Video
Who Should Enroll
- Current and aspiring data scientists
- Analysts
- Engineers
- Researchers
- Technical managers
Explore Related Programs
Request Information Now by completing the form below.
Data Science
Select Payment Method | Cost |
---|---|
$3,600 | |