TRAVEL INSURANCE PROJECT
Introduction
In this data set, a tour and travel company is offering a travel insurance package to their customers. This dataset includes variables that could be an influence on whether a customer will buy the travel insurance package. Some of these variables include age, whether they are a frequent flyer and much more. With this project, I will be using R to visually explore the data and create a logistic regression model to use as a classifier. After the building the logistic model, I will be using different metrics to evaluate the logistic regression model.
​
Link for Data: https://www.kaggle.com/tejashvi14/travel-insurance-prediction-data
Data Visualization using 'ggplot2'
Link to R Display and Code: https://rpubs.com/tracylam1/801047
​
In this section, I showed the data visually with bar and pie charts. In these visualizations, I used the variables age, employment type, whether they are a frequent flyer and their annual income since I predict those variables as the most influential for whether a customer buys the travel insurance.

Logistic Regression Model
In this section, I build a logistic regression model as a binary classifier to find out what affects whether or not a customer would purchase the travel insurance package based on the other variables. As seen in the logistic regression model below, the most significant predictors for the binary variable, TravelInsurance, include age, annual income, family members, and if they are a frequent flyer. This is due to the fact that the p-values of these variables are less than the significance level of 0.05. This mostly matched what I predicted to be the most influential in a customer's choice for buying the travel insurance.

Evaluating the Logistic Regression Model
In this section, I evaluated the logistic regression model using cross-validation method, misclassification error rate, confusion matrix, specificity, sensitivity and the ROC curve. As we can see in the results below, this misclassification error rate and cross-validation method gives around the same error estimate. However, since the error rate is a little high with around 22%, I believe that this may not be the best model to use to predict whether someone would buy the travel insurance or not.

We can get more insight to the results from the misclassification rate and cross validation method with the confusion matrix below. As we can see in the matrix, there are any false negatives. As seen below, the sensitivity is around 49% which isn't that high as we can see with the many false negatives.
