Linda Nozick is Professor and Director of Civil and Environmental Engineering at Cornell University. She is co-founder and a past director of the College Program in Systems Engineering and has been the recipient of several awards, including a CAREER award from the National Science Foundation and a Presidential Early Career Award for Scientists and Engineers from President Clinton for “the development of innovative solutions to problems associated with the transportation of hazardous waste.” Dr. Nozick has authored over 60 peer-reviewed publications, many focused on transportation, the movement of hazardous materials, and the modeling of critical infrastructure systems. She has been an associate editor for Naval Research Logistics and a member of the editorial board of Transportation Research Part A. Dr. Nozick has served on two National Academy Committees to advise the U.S. Department of Energy on renewal of their infrastructure. During the 1998-1999 academic year, she was a Visiting Associate Professor in the Operations Research Department at the Naval Postgraduate School in Monterey, California. Dr. Nozick holds a B.S. in Systems Analysis and Engineering from the George Washington University and an MSE and Ph.D. in Systems Engineering from the University of Pennsylvania.
Supervised learning is a general term for any machine learning technique that attempts to discover the relationship between a data set and some associated labels for prediction. In regression, the labels are continuous numbers. This course will focus on classification, where the labels are taken from a finite set of numbers or characters. The prototypical and perhaps most well-known example of classification is image recognition. The goal is to take an image (represented by its pixel values) and determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?
There are many practical classification tasks, such as determining whether an individual's financial history makes them high risk for a loan, whether there is a defect in a material based on some sensor readings, or whether a new email is spam or not. These problems share the same basic form and can be solved with many different types of mathematical, statistical, and probabilistic models developed by the machine learning community.
In this course, you will explore several powerful and commonly utilized techniques for supervised learning. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
- Finding Patterns in Data Using Cluster and Hotspot Analysis
- Regression Analysis and Discrete Choice Models
Key Course Takeaways
- Use linear discriminant analysis
- Build a logit model and an ordered logit model
- Examine naïve Bayes for classification
- Examine how to use support vector machines
- Develop the skills to use all of these techniques in R
How It Works
Who Should Enroll
- Current and aspiring data scientists
- Technical managers