Kara Karpman is an Adjunct Assistant Professor of Data Science and Statistics at Cornell University, as well as an Assistant Professor of Mathematics at Middlebury College in Middlebury, VT. Her research focuses on statistical modeling techniques for studying biological and financial data. Professor Karpman holds a B.S. in Mathematics from Duke University and an M.S. and Ph.D. in Applied Mathematics from Cornell University.
Data Cleaning With the TidyverseCornell Course
Course Overview
Data scientists use data collected from the real world to answer questions and solve problems that would otherwise be intractable. But since the world is complex, data collected to describe the world can also be complex, which makes it messy and difficult to work with. To successfully analyze data, data scientists need to spend time cleaning — or organizing and manipulating — their data to put it into a form that is easier to work with and understand.
In this course, you will delve into the world of data cleaning by presenting and manipulating your data with the Tidyverse in R. You will organize data by selecting only the variables you're interested in, creating new groups of data, and summarizing data in a way that makes sense for the questions you're trying to ask. You will also create high-quality plots to quickly summarize complex data. You will become familiar with the concept of tidy data and organize data sets in a way that allows for the most efficient analysis. Finally, you will work with data types of more complexity so that you can answer increasingly difficult questions as you take your new skills into your workplace. You will practice all these skills by working with four real-world, complex data sets. This course involves many hands-on coding exercises that will help you take your programming skills to the next level.
System requirements: This course contains a virtual programming environment that does not support the use of Safari, Edge, tablets, or mobile devices. Please use Chrome, Firefox, or Internet Explorer on a computer for this course.
“Exploring Data Sets With R” and “Measuring Relationships and Uncertainty” must be completed prior to starting this course.
Key Course Takeaways
- Tidy and clean a data set to answer specific questions
- Create data visualizations that you can use for exploratory data analysis and presentations
- Combine two data sets on a common variable so you can answer more complex and varied questions
- Format date-time objects and text data to gain information from a data set and perform analyses of greater complexity.
How It Works
Course Author
Who Should Enroll
- Current and aspiring data scientists and analysts
- Business decision makers
- Marketing analysts
- Consultants
- Executives
- Anyone seeking to gain deeper exposure to data science
100% Online
cornell's Top Minds
career