ISLR Chapter 1 - Introduction to Statistical Learning

Overview of Statistical Learning

Statistical learning simply refers to the broad set of tools that are available for understanding data. There are two main types of statistical learning: supervised and unsupervised.

Supervised Learning

Supervised learning involves building statistical models to predict outputs \( (Y) \) from inputs \( (X) \). For example, assume that we have a salary dataset for statisticians. The dataset consists of the experience level and salary for 10 different statisticians.

Years of Experience (X)	Salary (Y)
0.5	70000
1.0	74000
1.5	75000
2.0	75000
2.5	77000
3.0	80000
3.5	78000
4.0	79000
4.5	82000
5.0	85000

We could build a simple linear regression model to predict the salary of statisticians by using experience level as a predictor. This is an example of supervised learning, where we have supervising outputs (salary values) that guide us in developing a statistical model to determine the relationship between experience level and salary.

In general, there are two main types of supervised learning: regression and classification.

Regression

Predicting a quantitative output is known as a regression problem. For example, predicting someone's salary is a regression problem.

Classification

Predicting a qualitative output is known as a classification problem. For example, predicting whether a stock will go up or down is a classification problem.

Unsupervised Learning

Unsupervised learning involves building statistical models to determine relationships from inputs \( (X) \). There are no supervising outputs. For example, assume that we have a customer dataset. The dataset consists of the annual salary and annual spend on Amazon for 10 different individuals.

Customer ID	Salary	Spend
1	54425	1103
2	67953	1353
3	53135	3216
4	61452	4146
5	59374	2890
6	66544	975
7	55550	1240
8	58064	956
9	58918	6791
10	54162	4800
...	...	...

We could use a statistical clustering algorithm to group customers by their purchasing behavior. This is an example of unsupervised learning, where we do not have supervising outputs that already inform us which customers are low spenders, average spenders, or high spenders. Instead, we have to come up with the determination ourselves.

In general, there are two main types of unsupervised learning: clustering and association.

Clustering

Determining groupings is known as a clustering problem. For example, grouping customers together based on purchasing behavior is a clustering problem.

Association

Determining rules that describe large portions of a dataset is known as an association problem. For example, determining that people who buy X also buy Y is an association problem. A modern real-world example of this is Amazon's "frequently bought together" product recommendations.

Next Chapter →

ISLR Chapter 1 - Introduction to Statistical Learning

Bijen Patel

Bijen Patel

Overview of Statistical Learning

Supervised Learning

Regression

Classification

Unsupervised Learning

Clustering

Association

ISLR Chapter 10 - Unsupervised Learning

ISLR Chapter 9 - Support Vector Machines

ISLR Chapter 8 - Tree-Based Methods

ISLR Chapter 2 - What is Statistical Learning?

HMO vs PPO vs EPO vs POS - Health Plan Differences

Overview of Statistical Learning

Supervised Learning

Regression

Classification

Unsupervised Learning

Clustering

Association

Subscribe to Bijen Patel

Subscribe to Bijen Patel