Main Article Content
Programming Multi-Linear Regression equation for faster problem-solving and decision-making
Abstract
Background: Multi-Linear regression equation is one of the widely used mathematical modeling techniques employed by data scientists for making predictions from a given dataset (Y, X) where X is the set of independent variables also called the predictor variables, and Y is the dependent variable also called the response variable. When the pre-knowledge of the dataset (Y, X) is known, the future value of Y can be predicted whenever there are some variations in X. The accuracy of such prediction, by and large, depends on the total number of predictor variables involved. This is because, the larger the number, the better the prediction. But this in turn increases the complexity of the regression equation. Problem-solving involving such equation becomes extremely complex when the appropriate computer programming language is not employed, let alone doing it manually; and this culminates in delayed results for quick decision-making in a competitive business world. Objectives: This work shows how the R-programming language can be written for automating a multi-linear regression model for faster processing and quicker decision-making. Methods: A multi-linear regression equation was formulated from a sample dataset (Y, X) containing 28 values about the market sales of an establishment. The spreadsheet software, Ms-Excel, was used to store the dataset as ‘comma separated values’ (CSV) on a hard disk of a local computer. The R-programming function, ‘read.csv’, was used to read the dataset from the computer. Another R-programming function, cor(), was used to check the dataset for linearity. Finally, the coefficients of the formulated regression equation was determined using the R-function, lm(). Results: It took the computer less than 5 seconds to determine the coefficients of the multi-linear regression equation involving 5 predictor variables (x1, x2, x3, x4, and x5) for the response variable, Y, and whose dataset (Y, X) contained 28 values. Predictions about the response variable, Y, for arbitrarily values of market forces involving the predictor variable, X, were easily performed with the computer for quicker and better decision making. Conclusions: Data processing tasks involving multi-linear regression equations would be snail-slow as well as a clog to quick-decision making in a competitive business environment if a proper computer programming language such as ‘R’ were not employed.
Keywords: Multi-Linear regression, problem-solving, decision-making, dataset, R-programming