-
Leland McInnes: UMAP, HDBSCAN & the Geometry of Data | Learning from Machine Learning #10
- 2024/10/25
- 再生時間: 55 分
- ポッドキャスト
-
サマリー
あらすじ・解説
In this episode of Learning from Machine Learning, we explore the intersection of pure mathematics and modern data science with Leland McInnes, the mind behind an ecosystem of tools for unsupervised learning including UMAP, HDBSCAN, PyNN Descent and DataMapPlot. As a researcher at the Tutte Institute for Mathematics and Computing, Leland has fundamentally shaped how we approach and understand complex data.
Leland views data through a unique geometric lens, drawing from his background in algebraic topology to uncover hidden patterns and relationships within complex datasets. This perspective led to the creation of UMAP, a breakthrough in dimensionality reduction that preserves both local and global data structure to allow for incredible visualizations and clustering. Similarly, his clustering algorithm HDBSCAN tackles the messy reality of real-world data, handling varying densities and noise with remarkable effectiveness.
But perhaps what's most striking about Leland isn't just his technical achievements – it's his philosophy toward algorithm development. He champions the concept of "decomposing black box algorithms," advocating for transparency and understanding over blind implementation. By breaking down complex algorithms into their fundamental components, Leland argues, we gain the power to adapt and innovate rather than simply consume.
For those entering the field, Leland offers poignant advice: resist the urge to chase the hype. Instead, find your unique angle, even if it seems unconventional. His own journey – applying concepts from algebraic topology and fuzzy simplicial sets to data science – demonstrates how breakthrough innovations often emerge from unexpected connections.
Throughout our conversation, Leland's passion for knowledge and commitment to understanding shine through. His approach reminds us that the most powerful advances in data science often come not from following the crowd, but from diving deep into fundamentals and drawing connections across disciplines.
There's immense value in understanding the tools you use, questioning established approaches, and bringing your unique perspective to the field. As Leland shows us, sometimes the most significant breakthroughs come from seeing familiar problems through a new lens.
Resources for Leland McInnes
Leland’s Github
- UMAP
- HDBSCAN
- PyNN Descent
- DataMapPlot
- EVoC
References
- Maarten Grootendorst
- Learning from Machine Learning Episode 1
- Vincent Warmerdam - Calmcode
- Learning from Machine Learning Episode 2
- Matt Rocklin
- Emily Riehl - Category Theory in Context
- Lorena Barba
- David Spivak - Fuzzy Simplicial Sets
- Improving Mapper’s Robustness by Varying Resolution According to Lens-Space Density
Learning from Machine Learning
- Youtube
- https://mindfulmachines.substack.com/