Gender Bias in Movie Ratings · Matteo Bernardi

Research questions

The study investigates whether systematic biases exist in how male and female viewers rate films, and whether those biases interact with the gender of the film’s lead star. Four central questions framed the analysis:

Is there an occupation bias? Do rating patterns vary systematically across voter occupations?
Are there genre differences? Do average ratings diverge across film genres when broken down by voter gender?
Is there a gender bias? Do male and female voters rate films differently in general?
Is there a gender star bias? Do voter gender and star gender interact — and does the gender of the lead star change how male and female audiences rate a film?

Data pipeline

The project combined three major public datasets, processed through a multi-stage pipeline: get raw data → clean data → extract useful features → improve organization → save refined tables.

MovieLens 1M (2003) — voter ratings with demographic metadata: gender, age, occupation, zip code
MovieLens 32M (2024) — 32 million ratings with links to IMDb identifiers
IMDb Non-Commercial Datasets — cast lists, credit order, actor metadata

The pipeline produced four refined tables: Actors (actor_id, name, gender, birth year, death year), Movies (movie_id, title, year, genres), Cast (movie_id, actor_id, credit ordering), and Ratings (voter_id, movie_id, rating), with voter demographics attached.

Credit distribution and the star heuristic

Before measuring bias, the project examined how actors and actresses are distributed across credit positions. The average ordering distribution differs by gender: female actors tend to appear at lower credit positions (higher ordering numbers) than male actors, a structural asymmetry visible in both the KDE and the discrete CDF of average ordering by gender.

To identify “stars” in a data-driven way — without relying on subjective lists — a custom scoring heuristic was applied:

score_i = \frac{1}{n_i} \sum_{j=1}^{n_i} ordering_j^{(i)} - \frac{\log(n_i)}{n_i}

where n_i is the number of films actor i appears in and ordering_j^(i) is their credit position in film j. Lower scores indicate consistently high billing across many films. A threshold at the 25th percentile of this score distribution was used to classify actors as stars — yielding 3209 male stars and 1999 female stars from the dataset.

Examples of identified male stars include John Belushi, Humphrey Bogart, Elvis Presley, and Michael Douglas. Female stars include Judy Garland, Shirley Temple, Pamela Anderson, and Barbra Streisand.

Geographic segmentation

As a preliminary exploratory step, ratings were mapped by US state. California dominates in volume (18.5% of all ratings), followed by Texas (7.3%) and New York (7.8%). Average ratings vary across states in the range of approximately 3.39–3.81, revealing regional taste differences that serve as a useful baseline before the main bias analysis.

Genre and occupation breakdowns

Average ratings were computed segmented by genre, voter occupation, and voter gender, producing a three-way breakdown. The heatmap of average rating by gender and occupation (filtered for differences ≥ 0.5) reveals that some occupation-genre combinations show substantial divergence between male and female voters, while others are near-uniform. Writers, scientists, and retired voters tend toward higher average ratings; unemployed voters toward lower ones. The genre-level breakdown shows that Documentary and Film-Noir receive the highest average ratings overall, while Comedy and Horror receive the lowest.

Measuring the star gender bias

Films were partitioned into four conditions: those with at least one male star, at least one female star, only male stars, and only female stars. Within each condition, male and female voter rating distributions were compared using a Welch two-sample t-test and Cohen’s d as the effect size measure.

Results across all four conditions:

(See the report)

All effects are statistically significant (p < 2.2e-16 for most conditions) given the large sample sizes, but the effect sizes are negligible to small by conventional standards (d < 0.2). The largest effect appears in the “only female stars” condition, where female voters rate films with exclusively female stars higher than male voters do — a consistent directional pattern visible across all star-count thresholds from 3 to 21 stars considered.

The trend chart of average ratings by voter gender and star gender (varying the top-N stars considered) shows four stable lines: female voters on female-star films rate highest; male voters on male-star films rate second; male voters on female-star films and female voters on male-star films occupy lower, also stable, bands.

Network analysis

Two networks were constructed from random samples of 500 actors and 500 voters respectively.

The actor network connects actors who appeared in the same film. Fast-greedy community detection reveals distinct clusters. Centrality rankings across four metrics produce different top-10 lists, illustrating that “importance” in a network depends on what you measure:

Betweenness: Jack Nicholson, Robert De Niro, Sean Connery, Jack Lemmon, Gregory Peck — actors who bridge communities. Most are confirmed stars by the heuristic.
Degree: Robert De Niro (35 co-appearances), Frank Oz (27), Samuel L. Jackson (27), Robin Williams (26) — actors with the most direct connections.
Closeness: Louise Smith, Judith Fyfe, David Riva — actors that are structurally close to all others in the sample, not necessarily famous.
Eigenvector: Frank Oz, Jerry Nelson, Dave Goelz — the Muppet performers dominate this metric, because they co-appeared in many films together with a dense and internally connected cluster.

The divergence between betweenness and eigenvector centrality is the most instructive finding: betweenness identifies bridges between communities (typically major stars of mainstream cinema), while eigenvector identifies actors densely embedded in a single tightly-knit cluster (in this sample, the Muppet performers). Star status by the heuristic is strongly correlated with betweenness and degree, less so with eigenvector centrality.

The voter network connects viewers who highly rated the same films. Fast-greedy community detection on the 500-voter sample reveals distinct taste-based clusters, confirming that the audience is not homogeneous and that meaningful market segmentation exists.

MovieLens 32M + IMDbCohen’s d · Welch t-test · Network centralityPython · R · Graph analysis

The analysis was carried out jointly with Barbara, presented at Sapienza on 16 July 2025.

Read the report