Making a Sci-Fi Film for Success

an EDA project in collaboration with Jude Buenaseda

The Scenario

CompanyX sees all the other movie companies creating original video content, and they want to get in on the fun. CompanyX decides that their breakout original content will be a sci-fi movie. They really want to capture the niche and create a success with their first original feature. CompanyX has enlisted your team for consultation.

And with that, this project aimed to analyze the sci-fi movie genre through exploratory data analysis to ultimately understand successes within the genre and provide insight as to what type of films should be created to capture the sci-fi market.

The Approach

The first step we took was to source data on sci-fi movies that included the necessary markers of success — i.e. revenue, popularity score and average voter score. As the company’s goal is to create a successful movie, we needed to be able to quantify which prior releases were successful and build from there.

Our next step was to source data on the defining characteristics of each film — i.e sub-genre and rating. With those characteristics at our disposal, we would be able to further break down exactly what kind of sci-fi movie should be created to maximize success.

Lastly, after compiling our data into one sample sci-fi film dataset — containing basic information on each movie as well as our markers of success and defining characteristics — we could perform our exploratory analysis.

The Dataset

Data was compiled from three sources: web scraping the Box Office Mojo and API requests from The Movie DB and OMDB. Our final dataset included only data on movies released in the 2000s and gross revenue in North America, as this was most relevant to CompanyX’s goal and the current market.

The 10 sci-fi sub-genres that were analyzed were: Time Travel, Supernatural, Superhero, Space Opera, Robot, Post Apocalypse, Person vs Machine, Future, Alien Invasion and Affliction as provided by one of the data sources.

Our final dataset ended up being a sample size of around 300 films — after cleaning the various datasets that were acquired and merging on title, we only wanted to analyze films for which we had all the data points (no NaN values).

The Analysis

We aimed to understand what type of sci-fi movie was generating the most revenue. And similarly, we set out to understand what type of sci-fi movie was most popular. We also analyzed whether revenue and popularity had a strong correlation (which they did) — meaning that a sub-genre that performed well in popularity, also performed well in gross revenue generation. Thus, the sub-genre and rating that led in both indicators allowed us to present the precise combination for a successful sci-fi film.

EDA visualizations

The sci-fi sub-genre of supernatural performed the best out of all sci-fi movies in terms of average revenue and popularity — a score generated around a film’s online interactions.

As shown by the green line in the above distribution graph, the median revenue for Supernatural films is around $329M, about four times as much as the average revenue of all sci-fi films.

This graph culminates the trends that we found throughout our data analysis. As you can see above, each sub-genre’s average revenue is compared — with supernatural ahead of the other sub-genres. The combination of each sub-genre and their various movie ratings to revenue was also compared, resulting in PG-13 (purple) with a higher count than the other ratings.

Results and Last Thoughts

As portrayed in the data visualizations above, our EDA concluded that the winning combination for success within the sci-fi movie market is a Supernatural, PG-13 film!

However, it must be noted that Marvel films were about 60% of the average total revenue for the supernatural sub-genre. And therefore, what we must explore next is excluding all films associated with Marvel and see how our data analysis might change. And to dive deeper, a regression analysis between those who watch a Marvel film solely because its Marvel and those who watch a Marvel film because it is of the supernatural sub-genre would be another telling analysis to follow up with.

Data Science | Machine Learning