Logistic regression is a statistical method used to model the relationship between a binary dependent variable and one or more independent variables.

The logistic regression model, is vastly used in various fields such as medicine, finance, and social sciences, to analyze the relationship between a set of predictors (continuous or categorical), and a binary result.

In this tutorial, we will walk through the steps of fitting a logistic regression model using R Studio, and interpreting the results.

Getting Started

Before we begin, we need to install and load the required packages for this tutorial.

We will be using the “tidyverse” package for data manipulation and visualization, and the “broom” package for model summary output.

# Install and load required packages
install.packages(c("tidyverse", "broom"))
library(tidyverse)
library(broom)

Loading the data

For this tutorial, we will be using the built-in “mtcars” dataset in R Studio, which contains information about various car models.

Our goal is to predict whether a car has a V-shaped engine (vs = 1) or a straight engine (vs = 0) based on its weight and transmission type.

# Load the mtcars dataset
data(mtcars)

# View the first few rows of the dataset
head(mtcars)

The “mtcars” dataset has 32 rows and 11 columns, with information such as the car model, miles per gallon, and horsepower.

Preparing the Data

Before fitting a logistic regression model, we need to prepare the data by selecting the relevant variables and converting the dependent variable to a binary factor. In this case, we will use the “wt” and “am” variables as our independent variables, and the “vs” variable as our dependent variable.

# Select the relevant variables
data <- mtcars %>%
  select(vs, wt, am)

# Convert the dependent variable to a binary factor
data$vs <- factor(data$vs, levels = c(0, 1), labels = c("Straight", "V-shaped"))

Fitting the Model

Now that we have prepared the data, we can fit the logistic regression model.

In R Studio, we can use the glm() function to fit a logistic regression model, specifying the dependent variable (“vs”), independent variables (“wt” and “am”), and the binomial family for logistic regression.

# Fit a logistic regression model
model <- glm(vs ~ wt + am, data = data, family = binomial())

Interpretation of the Results

After fitting the model, we can use the “summary” function from the “broom” package to generate a summary of the model coefficients and fit statistics.

# Generate a summary of the model coefficients and fit statistics
summary(model)

Assembling the Predicted Results

This is how we add the predicted values to the data, and add a yes/no indication of the results (yes=1 and no=0)

data$pred <- predict(model, data, type = "response")

data$result <- ifelse(data$pred <= 0.5, 0, 1)

Assembling and interpreting the Confusion Matrix

With the confusion matrix, we’re able to assess our model and see if the variables we’ve picked, generate a good-enough predictor of the dependent variable.

confusion <- table("model"=data$result, "am"=data$am)
correct <- (confusion[1,1] + confusion[2,2])
total_cases <- NROW(data$pred)

correct_results <- correct / total_cases
correct_results

In our case, this is the confusion matrix and the ratio of correct prediction:

The results are not very high, and they indicate only 65.6% of correct prediction of the data.

We can always run the regression with a different set or combination of the data given to us.