Name: Finding Patterns in Data Using Cluster and Hotspot Analysis
Availability: InStock

Course Overview

When you have large groups of objects, it is often helpful to split them into meaningful groups or clusters. One example of this would be to identify different types of customers so that a company can more efficiently route their calls to a helpline. As a second example, suppose an automobile manufacturer wanted to segment their market to target the ads more carefully. One approach might be to take a database of recent car sales, including the social demographics associated with each customer, and segment the population purchasing each type of automobile into meaningful groups.

Specialized approaches exist if your data contains information that relates to time and geography. You can use this additional information to identify geographical and temporal hotspots. Hotspots are regions of high activity or a high value of a particular variable. These results can help you focus your attention on a particular region where a problem is occurring more than usual, such as the incidence of asthma in a large city. In both cluster and hotspot analysis, the results can help you discover new and interesting features, problems, and red flags regarding the data being analyzed.

In this course, you will explore several powerful and commonly utilized techniques for performing both cluster and hotspot analysis. You will implement these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible and applicable to your work.

You are required to have completed the following courses or have equivalent experience before taking this course:

Understanding Data Analytics
Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis

Key Course Takeaways

Partition multidimensional data sets into clusters using k-means and hierarchical clustering algorithms
Identify communities in social networks using the Louvain algorithm
Identify clusters across time and geography data using spatial scan statistics
Perform hotspot analysis using the Getis-Ord statistic

Discover More

How It Works

Course Length

2 weeks

Effort

8 to 10 hours of study per week

Format

100% online, instructor-led

Course Author

view details

Linda Nozick

Associate Dean

Cornell Duffield Engineering

Associate Dean, Cornell Duffield Engineering

Linda Nozick is a professor in Cornell’s School of Civil and Environmental Engineering. She served as director of the school for more than a decade. Prior to that role, Nozick was the director of the college’s Systems Engineering Program, which she co-founded. Nozick has been the recipient of several awards including a CAREER award from the National Science Foundation and a Presidential Early Career Award for Scientists and Engineers from President Clinton for “the development of innovative solutions to problems associated with the transportation of hazardous waste.” She is a past U.S. Presidential appointee to the U.S Nuclear Waste Technical Review Board. Nozick has authored over 100 peer-reviewed publications, many focused on transportation, the movement of hazardous materials, and the modeling of critical infrastructure systems. Nozick holds a B.S. in systems analysis and engineering from the George Washington University and a M.S.E and Ph.D. in systems engineering from the University of Pennsylvania.

Who Should Enroll

Current and aspiring data scientists
Analysts
Engineers
Researchers
Technical managers

Get It Done
100% Online

Our programs are expressly designed to fit the lives of busy professionals like you.

Learn From
cornell's Top Minds

Courses are personally developed by faculty experts to help you gain today's most in-demand skills.

Power Your
career

Cornell's internationally recognized standard of excellence can set you apart.

Stack To A Certificate

Request Information Now by completing the form below.

Act today—courses are filling fast.

I prefer to be contacted by:

Call
Text