The hospitality industry is an essential part of travelling and tourism. As the hotel industry’s market size grows, so is the rate of cancelled bookings and at one point this rate reached to as much as 40%. While cancellation of booking is inevitable due to unforeseen circumstances, there is merit in trying to reduce the cancellation rate.

The group is trying to determine the factors that may lead to cancellation of bookings. We trained various classifier machine learning models to predict the cancellation status using a hotel booking demand dataset that contains features like arrival date, daily rate etc. The best model came out to be the Random Forest Classifier with a test accuracy of 86.16%. The Random Forest model identified the top 3 predictors as the Lead Time, Average Daily Rate, and the week number. To look deeper into the customer behavior, we plotted out a Decision Tree using the Decision Tree classifier model that predicts cancellation status with an 82.73% accuracy and identified 7 distinct customer personas with a high likelihood of booking cancellation.

Our findings can help businesses through a potential system we designed where potential cancellations are flagged and hotel management can tailor-fit mitigation strategies to avoid cancellation. Our findings can also help customers through insights they can use in managing their bookings behavior, like not booking way too advance and pushing through with requests as management identify those requests as commitments.