Sumanta Basu is an Assistant Professor in the Department of Statistics and Data Science at Cornell University. Broadly, his research interests are structure learning and the prediction of large systems from data, with a particular emphasis on developing learning algorithms for time series data. Professor Basu also collaborates with biological and social scientists on a wide range of problems, including genomics, large-scale metabolomics, and systemic risk monitoring in financial markets. His research is supported by multiple awards from the National Science Foundation and the National Institutes of Health. At Cornell, Professor Basu teaches “Introductory Statistics” for graduate students outside the Statistics Department and “Computational Statistics” for Statistics Ph.D. students. He also serves as a faculty consultant at Cornell Statistical Consulting Unit, which assists the broader Cornell community with various aspects of analyzing empirical research. Professor Basu received his Ph.D. from the University of Michigan and was a postdoctoral scholar at the University of California, Berkeley, and Lawrence Berkeley National Laboratory. Before he received his Ph.D, Professor Basu was a business analyst, working with large retail companies on the design and data analysis of their promotional campaigns.
Data scientists make decisions by inferring the characteristics of a large population based on the characteristics of samples from that population. Basing a decision on samples is necessary since it would not be possible to measure every individual or unit in a population. However, it also means that data scientists need to consider the potential variability among samples before using those samples to make conclusions about the population. The variability across samples leads to uncertainty in decision-making, and understanding and quantifying that uncertainty is a key aspect of data science.
Throughout this course, Professor Basu will guide you through the nuances of understanding and quantifying the uncertainty around your results, and through making decisions in the face of that uncertainty. In data science, simulations offer a powerful framework with which to understand the uncertainty around your data, so you will learn to perform simulations in R and use a simulation-based framework to quantify uncertainty when studying the relationship between categorical variables. You will also use resampling techniques to understand numerical variables and compare their summary statistics across different levels of a categorical variable. Often, data scientists search for relationships between numerical variables and use one numerical variable to predict another numerical variable, and you will do this by building a prediction rule with linear regression while keeping the uncertainty of your results in mind. Finally, you will use the errors from linear regression to compare prediction rules and determine which prediction rules fit your data best. This course involves many hands-on coding exercises in R which will help you gain confidence in your programming skills.
System requirements: This course contains a virtual programming environment that does not support the use of Safari, Edge, tablets, or mobile devices. Please use Chrome, Firefox, or Internet Explorer on a computer for this course.
“Exploring Data Sets With R” and “Summarizing and Visualizing Data” must be completed prior to starting this course.
Key Course Takeaways
- Formulate hypotheses, use simulations to test those hypotheses, and understand the level of confidence you should place in your results
- Use resampling techniques to understand the uncertainty present in groups of numerical variables
- Use linear regression to build prediction rules with one numerical predictor
- Use multiple linear regression to build prediction rules with more than one numerical predictor and compare prediction rules
How It Works
Who Should Enroll
- Current and aspiring data scientists and analysts
- Business decision makers
- Marketing analysts
- Anyone seeking to gain deeper exposure to data science