Discuss@GL4L

Logistic Regression - Multicolliearity Issue


#1

Hi,

I have tried building a model on weather dataset. As i build my models i get the below error which is due to the presence of highly correlated variables.

Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred

In order to beat the problem I iteratively built many logistic models by removing variables with high VIF . At the end I am left with a very few variables and with VIF less than 2 for all. But still I am getting the error. Please advise how to deal with this.

R code is below

`mydata=read.csv("weather.csv")`
`variable=names(mydata)`
####Missing value Imputation using kNN


library(VIM)

Which variables have missing values

colnames(mydata)[colSums(is.na(mydata)) > 0]

mydata_imputed=kNN(mydata,variable = colnames(mydata)[colSums(is.na(mydata)) > 0],k=5)
colSums(is.na(mydata_imputed))

mydata_imputed=mydata_imputed[,1:24]
str(mydata_imputed)

#Building model

library(caret)
set.seed(1234)
Index=createDataPartition(mydata_imputed$RainTomorrow,p=0.75,list = FALSE)
Train=mydata_imputed[Index,]
Test=mydata_imputed[-Index,]
unique(mydata_imputed)
LogM1=glm(RainTomorrow~., data = mydata_imputed[,-c(1,2)], family = "binomial")

library(car)
car::vif(LogM1)

LogM2=glm(RainTomorrow~., data = mydata_imputed[,-c(1:4,16:17,20:21)], family = "binomial")
names(mydata_imputed)
car::vif(LogM2)

LogM3=glm(RainTomorrow~., data = mydata_imputed[,-c(1:4,12,15:17,20:21)], family = "binomial")
car::vif(LogM3)

LogM4=glm(RainTomorrow~., data = mydata_imputed[,-c(1:4,12,15:17,20:22)], family = "binomial")
car::vif(LogM4)

LogM5=glm(RainTomorrow~., data = mydata_imputed[,-c(1:4,12,13,15:17,20:22)], family = "binomial")
car::vif(LogM5)

LogM6=glm(RainTomorrow~., data = mydata_imputed[,-c(1:4,6:7,12,13,15:17,20:22)], family = "binomial")
car::vif(LogM6)