David Mimno is an Associate Professor and Chair of the Department of Information Science in the Ann S. Bowers College of Computing and Information Science at Cornell University. He holds a Ph.D. from UMass Amherst and was previously the head programmer at the Perseus Project at Tufts as well as a researcher at Princeton University. Professor Mimno’s work has been supported by the Sloan Foundation, the NEH, and the NSF.
Language Models and Next‑ Word PronunciationCornell Course
Course Overview
In this course, you will use Python to quantify the next-word predictions of large language models (LLMs) and understand how these models assign probabilities to text. You'll compare raw scores from LLMs, transform them into probabilities, and explore uncertainty measures like entropy. You'll also build n-gram language models from scratch, apply smoothing to handle unseen words, and interpret log probabilities to avoid numerical underflow.
By the end of this course, you will be able to evaluate entire sentences for their likelihood, implement your own model confidence checks, and decide when and how to suggest completions for real-world text applications.
You are required to have completed the following course or have equivalent experience before taking this course:
- LLM Tools, Platforms, and Prompts
Key Course Takeaways
- Implement Python code to generate and sample next-word probabilities for a given text corpus
- Apply methods that extend predictions to new, unseen contexts
- Use log probabilities to quantify text likelihood and develop heuristics for interpreting them
- Leverage the Hugging Face API to evaluate and compare next-word probabilities over larger sequences

How It Works
Course Author
Who Should Enroll
- Engineers
- Developers
- Analysts
- Data scientists
- AI engineers
- Entrepreneurs
- Data journalists
- Product managers
- Researchers
- Policymakers
- Legal professionals
100% Online
cornell's Top Minds
career