Welcome to our tutorial on Supervised Learning, a fundamental pillar of Machine Learning (ML). Our objective is to introduce you to the basics of Supervised Learning and help you understand its applications.
By the end of this tutorial, you will be able to:
- Understand what Supervised Learning is
- Know how it works and where it can be applied
- Write and understand basic code implementing Supervised Learning algorithms
Prerequisites: Basic knowledge in Python coding, understanding of fundamental ML concepts, and familiarity with libraries such as NumPy and Pandas.
Supervised Learning is a subset of machine learning where the model is trained on a labeled dataset. That means the dataset used to train the model also contains the solutions (labels) the model should predict.
The algorithm learns from the labeled dataset, and this learned model is then used to predict the labels of new, unseen data. Supervised learning is split into two categories: classification and regression.
Classification problems are when the output variable is a category, like 'spam' or 'not spam'. Regression problems are when the output variable is a real or continuous value, like 'house price'.
Let's look at some examples. We will use the popular ML library Scikit-learn.
We'll use the Iris dataset to classify iris flowers into three species.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load iris dataset
iris = load_iris()
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)
# Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Train the model using the training sets
knn.fit(X_train, y_train)
# Predict the response for test dataset
y_pred = knn.predict(X_test)
We'll use the Boston House Prices dataset to predict house prices.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load boston dataset
boston = load_boston()
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3)
# Create linear regression object
lr = LinearRegression()
# Train the model using the training sets
lr.fit(X_train, y_train)
# Make predictions using the testing set
y_pred = lr.predict(X_test)
In this tutorial, we introduced Supervised Learning, explained how it works, and looked at its two main types: classification and regression. We also provided two practical examples to demonstrate these concepts.
Next, you can try implementing other Supervised Learning algorithms like Support Vector Machines (SVM) or Decision Trees. For further learning, you can refer to the Scikit-learn documentation.
With the Iris dataset, try implementing a different classifier, like the Decision Tree, and compare the accuracy with the KNN classifier.
With the Boston House Prices dataset, try implementing a different regression algorithm, like the Support Vector Regression, and compare the result with the Linear Regression.
Try implementing a classification or regression model on a different dataset. You can find many datasets on UCI Machine Learning Repository.
Remember, the key to mastering these concepts is practice. Happy coding!