Linda Nozick is Professor and Director of Civil and Environmental Engineering at Cornell University. She is co-founder and a past director of the College Program in Systems Engineering and has been the recipient of several awards, including a CAREER award from the National Science Foundation and a Presidential Early Career Award for Scientists and Engineers from President Clinton for “the development of innovative solutions to problems associated with the transportation of hazardous waste.” Dr. Nozick has authored over 60 peer-reviewed publications, many focused on transportation, the movement of hazardous materials, and the modeling of critical infrastructure systems. She has been an associate editor for Naval Research Logistics and a member of the editorial board of Transportation Research Part A. Dr. Nozick has served on two National Academy Committees to advise the U.S. Department of Energy on renewal of their infrastructure. During the 1998-1999 academic year, she was a Visiting Associate Professor in the Operations Research Department at the Naval Postgraduate School in Monterey, California. Dr. Nozick holds a B.S. in Systems Analysis and Engineering from the George Washington University and an MSE and Ph.D. in Systems Engineering from the University of Pennsylvania.
Finding Patterns in Data Using Cluster and
Hotspot AnalysisCornell Course
When you have large groups of objects, it is often helpful to split them into meaningful groups or clusters. One example of this would be to identify different types of customers so that a company can more efficiently route their calls to a helpline. As a second example, suppose an automobile manufacturer wanted to segment their market to target the ads more carefully. One approach might be to take a database of recent car sales, including the social demographics associated with each customer, and segment the population purchasing each type of automobile into meaningful groups.
Specialized approaches exist if your data contains information that relates to time and geography. You can use this additional information to identify geographical and temporal hotspots. Hotspots are regions of high activity or a high value of a particular variable. These results can help you focus your attention on a particular region where a problem is occurring more than usual, such as the incidence of asthma in a large city. In both cluster and hotspot analysis, the results can help you discover new and interesting features, problems, and red flags regarding the data being analyzed.
In this course, you will explore several powerful and commonly utilized techniques for performing both cluster and hotspot analysis. You will implement these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible and applicable to your work.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
Key Course Takeaways
- Partition multidimensional data sets into clusters using k-means and hierarchical clustering algorithms
- Identify communities in social networks using the Louvain algorithm
- Identify clusters across time and geography data using spatial scan statistics
- Perform hotspot analysis using the Getis-Ord statistic
How It Works
Who Should Enroll
- Current and aspiring data scientists
- Technical managers