Sumanta Basu is an Assistant Professor in the Department of Statistics and Data Science at Cornell University. Broadly, his research interests are structure learning and the prediction of large systems from data, with a particular emphasis on developing learning algorithms for time series data. Professor Basu also collaborates with biological and social scientists on a wide range of problems, including genomics, large-scale metabolomics, and systemic risk monitoring in financial markets. His research is supported by multiple awards from the National Science Foundation and the National Institutes of Health. At Cornell, Professor Basu teaches “Introductory Statistics” for graduate students outside the Statistics Department and “Computational Statistics” for Statistics Ph.D. students. He also serves as a faculty consultant at Cornell Statistical Consulting Unit, which assists the broader Cornell community with various aspects of analyzing empirical research. Professor Basu received his Ph.D. from the University of Michigan and was a postdoctoral scholar at the University of California, Berkeley, and Lawrence Berkeley National Laboratory. Before he received his Ph.D, Professor Basu was a business analyst, working with large retail companies on the design and data analysis of their promotional campaigns.
Data Science ModelingCornell Certificate Program
Overview and Courses
In today's data-driven world, advanced data modeling techniques are essential for enabling informed decision making and strategic planning.
This certificate program is designed to help you understand predictive modeling, with a focus on making accurate predictions using various types of data. Throughout this program, you will explore models such as polynomial regression, splines, and generalized additive models. These models are used to analyze complex relationships within datasets that may include both numerical and categorical variables. You will also gain practical skills in building models using R, which will allow you to examine how different types of information can be combined to make predictions. You will have the opportunity to practice modeling interactions between different types of data, such as categories and numbers, and use decision trees to understand complex relationships that linear models are unable to capture. By the end of the program, you will be able to create and evaluate predictive models, equipping you with valuable skills for decision making in a variety of industries.
To be successful in this course, you should have a foundation in R programming and be able to leverage those skills to create and summarize datasets with visualizations, interpret data, employ simulations, use linear regression, clean data, and create visualizations. Experience with R will be critical to success as we don't explicitly teach how to use R in this certificate. High school or college level math and algebra are also recommended. If you do not have this experience, start with the Data Science Essentials certificate program.
The courses in this certificate program are required to be completed in the order that they appear.
Course list
- Apr 16, 2025
- Jul 9, 2025
- Oct 1, 2025
- Dec 24, 2025
In this course, you will explore strategies for incorporating categorical predictors in a regression model, including using dummy variables to represent different categories. You will inspect binary and nonbinary categorical variables and discover how to interpret the estimated coefficients of dummy variables.
As you progress through the course, you will practice modeling and interpreting interactions between categorical and quantitative predictors in a linear model. Finally, you will focus on defining and implementing decision trees, which are advantageous for capturing complex interactions between predictors that linear models may be unable to capture. By the end of the course, you will be equipped to transform categorical variables into numerical variables, fit regression models with categorical predictors, interpret dummy variable coefficients, and use decision trees for modeling complex relationships between predictors.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Nonlinear Regression Models
- Apr 30, 2025
- Jul 23, 2025
- Oct 15, 2025
The goal of this course is to introduce you to the fundamental concepts and techniques used in predictive modeling. Throughout this course, you will evaluate the balance between model flexibility and interpretability, examine how to select the best parameters using cross-validation, and practice building models that generalize well to new data. You will also explore techniques for splitting datasets, selecting tuning parameters, and fitting models using loss functions. By the end of the course, you will have a solid understanding of model flexibility, interpretability, and the bias-variance trade-off, equipping you to effectively build and evaluate predictive models.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Nonlinear Regression Models
- Modeling Interactions Between Predictors
- May 14, 2025
- Aug 6, 2025
- Oct 29, 2025
When working with real-world datasets, more than a single model may be required to capture the complexity of the data. Ensemble methods prove to be extremely useful with complex datasets by allowing us to combine simpler models to fully grasp the patterns in the data, thereby improving the predictive power of the models.
In this course, you'll discover how to use two ensemble methods: random forests and boosted decision trees. You'll practice these ensemble methods with datasets in R and apply the ensemble techniques you've learned to build robust predictive models. You'll practice improving decision tree performance using random forest models and practice interpreting those models. You'll then use another technique and apply boosting to reduce errors and aggregate predictions to decision trees.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Nonlinear Regression Models
- Modeling Interactions Between Predictors
- Foundations of Predictive Modeling
- May 28, 2025
- Aug 20, 2025
- Nov 12, 2025
How It Works
- View slide #1
- View slide #2
- View slide #3
- View slide #4
- View slide #5
- View slide #6
- View slide #7
- View slide #8
- View slide #9
Faculty Author
Key Course Takeaways
- Select an optimal model based on modeling goals and characteristics of a dataset
- Identify when a nonlinear model is necessary based on data characteristics and how to implement it
- Identify or detect when an interaction between predictors would improve a model
- Improve predictive accuracy by combining different models into an ensemble
Download a Brochure
Not ready to enroll but want to learn more? Download the certificate brochure to review program details.What You'll Earn
- Data Science Modeling Certificate from Cornell’s Ann S. Bowers College of Computing and Information Science
- 64 Professional Development Hours (6.4 CEUs)
Watch the Video
Who Should Enroll
- Current and aspiring data scientists and analysts
- Business decision makers
- Marketing analysts
- Consultants
- Executives
- Anyone seeking to gain deeper exposure to data science
Explore Related Programs
Request Information Now by completing the form below.
Data Science Modeling
Select Payment Method | Cost |
---|---|
$3,600 | |