Connor Patsel

Movie Success Prediction in R

A common application of data science is to try to predict, based on certain pre-existing factors, what changes can produce an increase in profit, revenue, etc. To this end, a group of 3 other students and myself sought out to answer if there was a correlation between a movie's success (measured by its revenue) and other factors related to its production.

The dataset used consisted of around 45,000 movies containing entries for their revenue, run time, IMDb rating and popularity score. This data had to be cleaned, and the resulting data set was just short of 10,000 entries.

The first approach to estimating this was to use simple linear regression. Though adequate, we felt we should use another model. Eventually we settled on using a generalized additive model to produce a spline through the data.