Capacity
ane: Introduction
2: Recommendation systems
three: Detail-based filtering
4: Classification
v: More on classification
6: Naïve Bayes
7: Unstructured text
8: Clustering

alt text

A guide to applied data mining, commonage intelligence, and building recommendation systems by Ron Zacharski. This piece of work is licensed nether a Creative Commons Attribution-NonCommercial 4.0 International License.It is available as a complimentary download nether a Creative Commons license. Y'all are costless to share the book, translate it, or remix it.

About the book

Before you is a tool for learning bones data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to sympathise. Don't get me wrong, the information in those books is extremely important. Yet, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner's hands-on guide equally a first footstep. That's what this volume provides. This guide follows a learn-by-doing arroyo. Instead of passively reading the volume, I encourage you lot to piece of work through the exercises and experiment with the Python code I provide. I promise you will be actively involved in trying out and programming information mining techniques. The textbook is laid out as a series of small steps that build on each other until, past the fourth dimension you consummate the book, you have laid the foundation for understanding information mining techniques.

Table of Contents

This book'due south contents are freely available as PDF files. When you click on a chapter title beneath, you lot volition exist taken to a webpage for that affiliate. That page contains links for the PDF, the Python code used for the chapter besides as the chapter's sample data sets. Delight let me know if you lot see an error in the book, if some office of the book is confusing, or if you have some other comment. I will use these to revise the chapters.

Download the entire book

You can too download the book as one big (~150MB) pdf and all the source code at https://github.com/zacharski/pg2dm-python.

Affiliate 1 Introduction

Finding out what information mining is and what problems information technology solves. What will you exist able to do when y'all finish this book.

Chapter ii: Get Started with Recommendation Systems

Introduction to social filtering. Basic altitude measures including Manhattan distance, Euclidean distance, and Minkowski distance. Pearson Correlation Coefficient. Implementing a basic algorithm in Python.

Chapter iii: Implicit ratings and item-based filtering

A discussion of the types of user ratings we can use. Users can explicitly give ratings (thumbs up, thumbs downward, 5 stars, or whatever) or they can rate products implicitly–if they buy an mp3 from Amazon, we can view that buy as a 'similar' rating.

Affiliate 4: Classification

In previous capacity we used people'due south ratings of products to make recommendations. Now we turn to using attributes of the products themselves to brand recommendations. This approach is used by Pandora amongst others.

Chapter v: Further Explorations in Classification

A discussion on how to evaluate classifiers including 10-fold cross-validation, leave-i-out, and the Kappa statistic. The thousand Nearest Neighbor algorithm is also introduced.

Chapter 6: Naïve Bayes

An exploration of Naïve Bayes classification methods. Dealing with numerical data using probability density functions.

Chapter vii: Naïve Bayes and unstructured text

This affiliate explores how we tin can use Naïve Bayes to allocate unstructured text. Tin can nosotros classify twitter posts most a flick every bit to whether the postal service was a positive review or a negative one?

Affiliate eight: Clustering

Clustering – both hierarchical and kmeans clustering.

Attributions

To keep the book entertaining, it includes numerous pictures. I relied on a big number of people who generously fabricated their photos available either under a Creative Commons license or under public domain. I would like to give thanks these photographers for their generosity. A Google Spreadsheet lists the source of each picture in the book.