Dataset - Winners of the Oscars Award


Description Download Implementation Statistical Analysis Improvements


Description

This dataset contains four categories of the Academy Awards

A scrapper is written in R that collects the following information about movies since 1928 (from imdb.com and filmaffinity.com) for each of the above category

Challenges




Download

The following .csv files contain the required dataset for the best pictures, directors, actors, actresses, All Together




Implementation

The R implementation is available on Github




Statistical Analysis

We can perform two types of Analysis on this data: Qualitative and Quantitative. In the qualitative analysis, we can ask questions such as "Which movie received all the four awards?", "Which winning movie had the lowest IMDB rating?", "Which winning movie has the maximum duration" and so on.

.

P.S. If you make Qualitative analysis on this data and want it to make it public, then please contact me and I will put it here with your full credits.

We wish to use this data for predictive purposes; therefore, we perform some statistical analysis on this dataset

Best Picture Best Director Best Actor Best Actress
Mean Number of Nominations 9.16 8.83 6.78 6.36
Std. Deviation of Number of Nominations 2.42 2.65 2.91 3.19
Maximum Number of Nominations 14 14 13 13
Minimum Number of Nominations 3 3 2 2
Mean Users Rating 7.86 7.91 7.77 7.55
Std. Deviation of Users Rating 0.59 0.54 0.51 0.46
Maximum Users Rating 9.2 9 9.2 8.7
Minimum Users Rating 6 6.1 5.8 6.4
Mean Duration 138.63 137.52 122.41 115.57
Std. Deviation of Duration 31.57 33.85 26.18 22.35
Maximum Duration 238 238 212 238
Minimum Duration 90 85 85 69
Mean MCR 83.75 83.87 80.71 76.6
Std. Deviation of MCR 9.26 8.11 11.06 9.88
Maximum MCR 100 100 100 91
Minimum MCR 64 65 56 53
Majority Month of Release December December December December
Two Most Occuring Genre Drama, Romance Drama, Romance Drama, Biography Drama, Romance
Mean Sentiment* -0.60 -0.47 -1.12 -0.50
Std. Deviation of Sentiment 3.04 3.09 3.36 2.94
Maximum Sentiment 6 6 8 6
Minimum Sentiment -11 -11 -11 -11

* indicates the valence or pleasure of the text synopsis. It is calculated as the sum of valence values of words in the synopsis based on the word list available here .


Improvements

The quality of this data can be improved by