Course:CPSC312-2017-Linear Regression

From UBC Wiki

Linear Regression

Authors: Trevor Stokvis, Kevin Lapeyre

What is the problem?

State the general problem. If applicable, tell us what information you will use, e.g., a link to some web site that provides the information you used.

Linear regression is a method used to fit a line to a data set that can then be used to make predictions for new data points. We will be creating a predict function that takes a list of numbers of size d (representing the new data point) and a list of list of numbers of size n by (d+1), where the data set is ((y1|x1),(y2|x2)...(yn|xn)). Using the data set and the new data point, the program will predict y_hat given the new data point x_hat.

What is the something extra?

What is the in-depth aspect you will do? If the problem is related to some other groups, tell us how they fit together. If in doubt, include it.

We will be using multiple same length lists to represent rows in a matrix. To solve this problem we will need to build several matrix operations including matrix multiplication, inversion, adjudication and transposition. We will be calculating our "betas" by the following formula: B = inv(trans(X)*X) * trans(X) * y. This will give us a vector (a list of one value lists), B. To calculate the prediction is found multiplying B*x_hat, with the out come being a single value-- our estimated prediction y_hat.

For more information on linear regression: wiki

What did we learn from doing this?

(This should be written after you have done the work.) What is the bottom-line? Is logic programming suitable for (part-of) the task? Make sure you include the evidence for your claims.

We managed to create a working logic program that solves the intended problem. Given a dataset, the prolog program outputs a predicted answer for a given example. We were able to implement linear regression and ridge regression techniques for predicting answers for example data.

Logic programming was suitable for the task - given a data set, we are asking whether or not a linear fit exists (yes or no question, or 'true or false'), and if it does, what value would an answer be that fits the example data given. The matrix manipulation was probably the most difficult aspect of the program, which was recursion heavy (especially for datasets with multiple features). Once that was all worked out, we set up the predicates to mirror the mathematical functions needed to calculate the linear model.

Our code for the project can be found on the following github repository: