Lecture 01: Machine Learning

Pre-requisites to learn these notes:

Statistics
A general knowledge to computer programming and its working

Introduction:

1.0. Machine Learning:

Machine Learning is a subfield of Artificial Intelligence that enables systems to automatically learn and improve from experience without being explicitly programmed. The goal is to develop algorithms that can analyze data, identify patterns, and make predictions or decisions with minimal human intervention.

Problems of ML:

Machine Learning is highly dependent on data. More the data, more the computation is required. Research is going on to create accurate model from less data. Machine Learning could be a huge field of research which are in interests to everyone.

Before jumping into linear regression Model(one of the basic machine learning algorithm), one has to consider selecting the right data and processing. We would like to discuss few things before starting to train-test the model. Keep in mind, you might have questions while reading the points below this, have patience because most of things that you might not understand in these numbered information would be more clear when we start discussing Linear Regression Model.

Feature Engineering: Identifying and selecting relevant features that have strong correlation with the target variable. This includes creating new features, transforming existing ones, and handling missing or irrelevant data.
Data Cleaning: Removing any missing or incorrect values and dealing with outliers, as they can have a significant impact on the model's performance.
Feature Scaling: It is important to scale the features to avoid any bias towards high-magnitude features.
Train-Test Split: Dividing the dataset into training and testing sets to evaluate the model's performance on unseen data.
Model Selection: Choosing the appropriate model based on the type of problem, data characteristics, and computational resources.
Model Evaluation Metrics: Selecting appropriate evaluation metrics to quantify the model's performance, such as R-squared, Mean Squared Error, or Root Mean Squared Error.
Model Assumptions: Verifying the assumptions of linear regression, such as linearity, independence, homoscedasticity, and normality of residuals.

1.1. Linear Regression Model

Informal Idea:

This informal idea is made just to give you a flavor of what is coming for you and if you are unable to understand few bits of the formal ideas, you can always come back to get the rough idea.

Linear Regression Model is somehow a hit and trial method. You guess a equation with ‘n’ number of parameters and put your known values into the equation. Most of the times, it’s not guessing but starting with the basic equation like linear equation and improving the parameters and power with increasing amount of data. Solve the linear equation to identify our parameters. Once known the parameters, training part is complete, and we test our model. If it works, it works! But if it doesn’t, one has to help the machine to feed a new data point into its already saved collection of data and has to determine the parameters again to improve the quality of our model.

Formally,

Linear Regression is a statistical method for modeling the relationship between a dependent (response) variable and one or more independent (predictor) variables. The goal is to find the best-fitting straight line (or hyperplane in higher dimensions) that describes the linear relationship between the response and predictor variables. The line is represented by an equation with coefficients that represent the strength and direction of the relationship between the variables. The coefficients can be estimated using various optimization techniques, such as ordinary least squares.

Hypothesis function for linear regression model is:

$$ y=ax+b $$

The parameters in this case are {a, b}.

To identify the parameter for the given data points, we can evaluate the minimum error and use differentiation. To give you a more clear idea, take a example:

x	y
2	20
3	30
4	40
5	55
7	77
9	89

squared Error can be evaluated as,

⇒$Δδ_i=(y_i-(ax_i+b))^2$

⇒$ΣΔδ_i=Σ(y_i-(ax_i+b))^2$

⇒$δ=Σ(y_i-(ax_i+b))^2$

To evaluate the value of a and b, we take derivative with respect to a keeping b constant and b keeping a constant.

⇒$\frac{dδ}{da}=\frac{dΣ(y_i-(ax_i+b))^2}{da}$ & ⇒$\frac{dδ}{db}=\frac{dΣ(y_i-(ax_i+b))^2}{db}$

⇒$\frac{dδ}{da}={Σ2(y_i-(ax_i+b))(-a)}$ & ⇒$\frac{dδ}{db}={Σ2(y_i-(ax_i+b))(-1)}$

We want to evaluate the value of a and b such that δ is minimum. Then,

⇒$0={Σ2(y_i-(ax_i+b))(-a)}$ & ⇒$0={Σ(y_i-(ax_i+b))}$

⇒$b$=$\frac{Σ(y_i-ax_i)}{n}$

⇒$0=Σay_i-(a^2x_i+ab)-I$

substituting b in $eq^n-I$, we can solve and can get the value of a and b.

In our example,

$Σx_i=30$ , $Σy_i=311$ & n=6

⇒$b=\frac{311-30a}{6}$

⇒$0=311a-(a^230+a\frac{311-30a}{6})$

⇒$0=311-(30a+\frac{311-30a}{6})$

⇒$311*6=180a+311-30a$

⇒$311*5=30*5a$

⇒$a=10.3$

⇒$b=0.3$

By our hypothesis, the graph would now look like:

We have successfully curve fit into our data collection.

This part in technical terms is called machine Training. Now that we have trained our machine based on the data we had, We move into data testing. We evaluate whatever result we wanted using the data-point . If the error for expected and actual output is minimum, the value of parameters doesn’t change. But if the error is high, one has to determine the value of parameters again to minimize the errors. Remember our goal is to predict with minimum errors.

We have discussed the linear regression model on surface by now. One can wonder, to determine the value of parameters, why squared error? and why not

mode of error
variance
Or any other statistical way of determining the errors in the data

Mode of error is not preferred because of its mathematical nature. The derivative of mode of x can not be determined at x=0.Also, mode is not sensitive to data points that are significantly different from other data points in a dataset. The mode is not well-suited for modeling continuous variables, which are commonly used in linear regression models. Continuous variables can take on a wide range of values, and the mode may not provide a good representation of the central tendency of the values.

Similarly, the other error calculation are acceptable in a broad sense. The selection of parameter’s calculation is based on the type of data we have and the kind of prediction we want to make.