950 episodes. 959 people. 1 package!
survivoR is a collection of data sets detailing events across 60 seasons of Survivor US, Survivor Australia, Survivor South Africa and Survivor New Zealand. It includes castaway information, vote history, immunity and reward challenge winners, jury votes, advantage details and heaps more!
Now on CRAN (v2.0.4) or Git (v2.0.7).
If Git > CRAN I’d suggest install from Git. We are constantly improving the data sets so the github version is likely to be slightly improved.
install.packages("survivoR")
::install_github("doehm/survivoR") devtools
sit_out
to challenge_results
Confessional counts from myself, Carly Levitz, Sam, Grace and others
A table containing summary details of each season of Survivor, including the winner, runner ups and location.
season_summary#> # A tibble: 60 × 22
#> version versi…¹ seaso…² season locat…³ country tribe…⁴ full_…⁵ winne…⁶ winner
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 AU AU01 Surviv… 1 Upolu Samoa "The 2… Kristi… AU0024 Krist…
#> 2 AU AU02 Surviv… 2 Upolu Samoa "The 2… Jerich… AU0048 Jeric…
#> 3 AU AU03 Surviv… 3 Savusa… Fiji "The 2… Shane … AU0071 Shane
#> 4 AU AU04 Surviv… 4 Savusa… Fiji "Two t… Pia Mi… AU0094 Pia
#> 5 AU AU05 Surviv… 5 Savusa… Fiji "Two t… David … AU0086 David
#> 6 AU AU06 Surviv… 6 Cloncu… Austra… "The 2… Hayley… AU0119 Hayley
#> 7 AU AU07 Surviv… 7 Charte… Austra… "Blood… Mark W… AU0031 Mark
#> 8 NZ NZ01 Surviv… 1 San Ju… Nicara… "Two t… Avi Du… NZ0016 Avi
#> 9 SA SA01 Surviv… 1 Pearl … Panama "The 1… Vaness… SA0010 Vanes…
#> 10 SA SA02 Surviv… 2 Johor Malays… "Two t… Lorett… SA0030 Loret…
#> # … with 50 more rows, 12 more variables: runner_ups <chr>, final_vote <chr>,
#> # timeslot <chr>, premiered <date>, ended <date>, filming_started <date>,
#> # filming_ended <date>, viewers_premiere <dbl>, viewers_finale <dbl>,
#> # viewers_reunion <dbl>, viewers_mean <dbl>, rank <dbl>, and abbreviated
#> # variable names ¹version_season, ²season_name, ³location, ⁴tribe_setup,
#> # ⁵full_name, ⁶winner_id
This data set contains season and demographic information about each castaway. It is structured to view their results for each season. Castaways that have played in multiple seasons will feature more than once with the age and location representing that point in time. Castaways that re-entered the game will feature more than once in the same season as they technically have more than one boot order e.g. Natalie Anderson - Winners at War.
Each castaway has a unique castaway_id
which links the
individual across all data sets and seasons. It also links to the
following ID’s found on the vote_history
,
jury_votes
and challenges
data sets.
vote_id
voted_out_id
finalist_id
|>
castaways filter(season == 42)
#> # A tibble: 18 × 16
#> version version_se…¹ seaso…² season full_…³ casta…⁴ casta…⁵ age city state
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 US US42 Surviv… 42 Jackso… US0613 Jackson 47 Hous… Texas
#> 2 US US42 Surviv… 42 Zach W… US0626 Zach 21 St. … Miss…
#> 3 US US42 Surviv… 42 Marya … US0618 Marya 47 Nobl… Indi…
#> 4 US US42 Surviv… 42 Jenny … US0614 Jenny 43 Broo… New …
#> 5 US US42 Surviv… 42 Swati … US0624 Swati 19 Palo… Cali…
#> 6 US US42 Surviv… 42 Daniel… US0610 Daniel 30 New … Conn…
#> 7 US US42 Surviv… 42 Lydia … US0617 Lydia 22 Sant… Cali…
#> 8 US US42 Surviv… 42 Chanel… US0609 Chanel… 28 New … New …
#> 9 US US42 Surviv… 42 Rocksr… US0622 Rocksr… 43 Las … Neva…
#> 10 US US42 Surviv… 42 Tori M… US0625 Tori 24 Roge… Ariz…
#> 11 US US42 Surviv… 42 Hai Gi… US0612 Hai 29 New … Loui…
#> 12 US US42 Surviv… 42 Drea W… US0611 Drea 34 Mont… Queb…
#> 13 US US42 Surviv… 42 Omar Z… US0621 Omar 31 Whit… Onta…
#> 14 US US42 Surviv… 42 Lindsa… US0616 Lindsay 30 Asbu… New …
#> 15 US US42 Surviv… 42 Jonath… US0615 Jonath… 28 Gulf… Alab…
#> 16 US US42 Surviv… 42 Romeo … US0623 Romeo 37 Norw… Cali…
#> 17 US US42 Surviv… 42 Mike T… US0620 Mike 57 Hobo… New …
#> 18 US US42 Surviv… 42 Maryan… US0619 Maryan… 24 Ajax Onta…
#> # … with 6 more variables: episode <dbl>, day <dbl>, order <dbl>, result <chr>,
#> # jury_status <chr>, original_tribe <chr>, and abbreviated variable names
#> # ¹version_season, ²season_name, ³full_name, ⁴castaway_id, ⁵castaway
A few castaways have changed their name from season to season or have
been referred to by a different name during the season e.g. Amber
Mariano; in season 8 Survivor All-Stars there was Rob C and Rob M. That
information has been retained here in the castaways
data
set.
castaway_details
contains unique information for each
castaway. It takes the full name from their most current season and
their most verbose short name which is handy for labelling.
It also includes gender, date of birth, occupation, race and ethnicity data. If no source was found to determine a castaways race and ethnicity, the data is kept as missing rather than making an assumption.
castaway_details#> # A tibble: 959 × 11
#> castaway_id full_n…¹ casta…² date_of_…³ date_of_…⁴ gender race ethni…⁵ poc
#> <chr> <chr> <chr> <date> <date> <chr> <chr> <chr> <chr>
#> 1 AU0001 Des Qui… Des NA NA Male <NA> <NA> <NA>
#> 2 AU0002 Bianca … Bianca NA NA Female <NA> <NA> <NA>
#> 3 AU0003 Evan Jo… Evan NA NA Male <NA> <NA> <NA>
#> 4 AU0004 Peter F… Peter NA NA Male <NA> <NA> <NA>
#> 5 AU0005 Barry L… Barry NA NA Male <NA> Aborig… <NA>
#> 6 AU0006 Tegan H… Tegan NA NA Female <NA> <NA> <NA>
#> 7 AU0007 Rohan M… Rohan NA NA Male <NA> <NA> <NA>
#> 8 AU0008 Kat Dum… Katinka 1989-09-21 NA Female <NA> <NA> <NA>
#> 9 AU0009 Andrew … Andrew NA NA Male <NA> <NA> <NA>
#> 10 AU0010 Craig I… Craig NA NA Male <NA> <NA> <NA>
#> # … with 949 more rows, 2 more variables: occupation <chr>,
#> # personality_type <chr>, and abbreviated variable names ¹full_name,
#> # ²castaway, ³date_of_birth, ⁴date_of_death, ⁵ethnicity
This data frame contains a complete history of votes cast across all seasons of Survivor. This allows you to see who who voted for who at which Tribal Council. It also includes details on who had individual immunity as well as who had their votes nullified by a hidden immunity idol. This details the key events for the season.
<- vote_history |>
vh filter(
== 42,
season == 9
episode
)
vh#> # A tibble: 10 × 22
#> version version_…¹ seaso…² season episode day tribe…³ tribe casta…⁴ immun…⁵
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 US US42 Surviv… 42 9 17 Merged Kula… Hai Indivi…
#> 2 US US42 Surviv… 42 9 17 Merged Kula… Mike <NA>
#> 3 US US42 Surviv… 42 9 17 Merged Kula… Omar <NA>
#> 4 US US42 Surviv… 42 9 17 Merged Kula… Rocksr… <NA>
#> 5 US US42 Surviv… 42 9 17 Merged Kula… Romeo <NA>
#> 6 US US42 Surviv… 42 9 17 Merged Kula… Drea Hidden
#> 7 US US42 Surviv… 42 9 17 Merged Kula… Jonath… Indivi…
#> 8 US US42 Surviv… 42 9 17 Merged Kula… Lindsay <NA>
#> 9 US US42 Surviv… 42 9 17 Merged Kula… Maryan… Hidden
#> 10 US US42 Surviv… 42 9 17 Merged Kula… Tori <NA>
#> # … with 12 more variables: vote <chr>, vote_event <chr>,
#> # vote_event_outcome <chr>, split_vote <chr>, nullified <lgl>, tie <lgl>,
#> # voted_out <chr>, order <dbl>, vote_order <dbl>, castaway_id <chr>,
#> # vote_id <chr>, voted_out_id <chr>, and abbreviated variable names
#> # ¹version_season, ²season_name, ³tribe_status, ⁴castaway, ⁵immunity
|>
vh count(vote)
#> # A tibble: 4 × 2
#> vote n
#> <chr> <int>
#> 1 Rocksroy 4
#> 2 Romeo 1
#> 3 Tori 4
#> 4 <NA> 1
Note: From v1.1 the challenge_results
dataset has been
improved but could break existing code. The old table is maintained at
challenge_results_dep
There are two tables challenge_results
and
challenge_description
.
A tidy data frame of immunity and reward challenge results. The winners and losers of the challenges are found recorded here.
|>
challenge_results filter(season == 42) |>
group_by(castaway) |>
summarise(
won = sum(result == "Won"),
Lost = sum(result == "Lost"),
total_challenges = n(),
chose_for_reward = sum(chosen_for_reward)
)#> # A tibble: 18 × 5
#> castaway won Lost total_challenges chose_for_reward
#> <chr> <int> <int> <int> <int>
#> 1 Chanelle 4 7 11 0
#> 2 Daniel 3 4 7 0
#> 3 Drea 5 11 16 0
#> 4 Hai 5 10 15 0
#> 5 Jackson 0 1 1 0
#> 6 Jenny 2 2 4 0
#> 7 Jonathan 10 10 20 1
#> 8 Lindsay 9 10 19 1
#> 9 Lydia 4 5 9 0
#> 10 Marya 1 2 3 0
#> 11 Maryanne 7 13 20 1
#> 12 Mike 5 15 20 2
#> 13 Omar 6 12 18 1
#> 14 Rocksroy 5 8 13 0
#> 15 Romeo 5 15 20 1
#> 16 Swati 3 3 6 0
#> 17 Tori 9 4 13 0
#> 18 Zach 1 1 2 0
The challenge_id
is the primary key for the
challenge_description
data set. The
challange_id
will change as the data or descriptions
change.
TODO: Each challenge must have an ID and link to challenge description
This data set contains descriptive binary fields for each challenge. Challenges can go by different names but where possible recurring challenges are kept consistent. While there are tweaks to the challenges, where the main components of the challenge is consistent, they share the same name.
The features of each challenge have been determined largely through string searches of key words that describe the challenge. It may not capture the full essence of the challenge but on the whole will provide a good basis for analysis. Since the description is simply a short paragraph or sentence it may not flag all appropriate features. If any descriptive features need altering please let me know in the issues.
Features:
puzzle
: If the challenge contains a puzzle
element.race
: If the challenge is a race between tribes, teams
or individuals.precision
: If the challenge contains a precision
element e.g. shooting an arrow, hitting a target, etc.endurance
: If the challenge is an endurance event
e.g. last tribe, team, individual standing.strength
: If the challenge is largerly strength based
e.g. Shoulder the Load.turn_based
: If the challenge is conducted in a series
of rounds until a certain amount of points are scored or there is one
player remaining.balance
: If the challenge contains a balancing
element.food
: If the challenge contains a food element e.g. the
food challenge, biting off chunks of meat.knowledge
: If the challenge contains a knowledge
component e.g. Q and A about the location.memory
: If the challenge contains a memory element
e.g. memorising a sequence of items.fire
: If the challenge contains an element of fire
making / maintaining.water
: If the challenge is held, in part, in the
water.
challenge_description#> # A tibble: 1,024 × 14
#> challeng…¹ chall…² puzzle race preci…³ endur…⁴ stren…⁵ turn_…⁶ balance food
#> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 CC0053 Barrel… FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> 2 CC0079 Blue L… TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> 3 CC0114 By the… FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
#> 4 CC0138 Choose… FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
#> 5 CC0232 Flashb… FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#> 6 CC0305 Home S… TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
#> 7 CC0334 Kenny … TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> 8 CC0358 Log Jam FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
#> 9 CC0371 Maroon… FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
#> 10 CC0408 O-Blac… FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> # … with 1,014 more rows, 4 more variables: knowledge <lgl>, memory <lgl>,
#> # fire <lgl>, water <lgl>, and abbreviated variable names ¹challenge_id,
#> # ²challenge_name, ³precision, ⁴endurance, ⁵strength, ⁶turn_based
|>
challenge_description summarise_if(is_logical, sum)
#> # A tibble: 1 × 12
#> puzzle race precision endurance strength turn_…¹ balance food knowl…² memory
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 277 810 224 148 109 155 185 24 56 26
#> # … with 2 more variables: fire <int>, water <int>, and abbreviated variable
#> # names ¹turn_based, ²knowledge
History of jury votes. It is more verbose than it needs to be, however having a 0-1 column indicating if a vote was placed or not makes it easier to summarise castaways that received no votes.
|>
jury_votes filter(season == 42)
#> # A tibble: 24 × 9
#> version version_season season_…¹ season casta…² final…³ vote casta…⁴ final…⁵
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr>
#> 1 US US42 Survivor… 42 Jonath… Romeo 0 US0615 US0623
#> 2 US US42 Survivor… 42 Lindsay Romeo 0 US0616 US0623
#> 3 US US42 Survivor… 42 Omar Romeo 0 US0621 US0623
#> 4 US US42 Survivor… 42 Drea Romeo 0 US0611 US0623
#> 5 US US42 Survivor… 42 Hai Romeo 0 US0612 US0623
#> 6 US US42 Survivor… 42 Tori Romeo 0 US0625 US0623
#> 7 US US42 Survivor… 42 Rocksr… Romeo 0 US0622 US0623
#> 8 US US42 Survivor… 42 Chanel… Romeo 0 US0609 US0623
#> 9 US US42 Survivor… 42 Jonath… Mike 1 US0615 US0620
#> 10 US US42 Survivor… 42 Lindsay Mike 0 US0616 US0620
#> # … with 14 more rows, and abbreviated variable names ¹season_name, ²castaway,
#> # ³finalist, ⁴castaway_id, ⁵finalist_id
|>
jury_votes filter(season == 42) |>
group_by(finalist) |>
summarise(votes = sum(vote))
#> # A tibble: 3 × 2
#> finalist votes
#> <chr> <dbl>
#> 1 Maryanne 7
#> 2 Mike 1
#> 3 Romeo 0
This dataset lists the hidden idols and advantages in the game for
all seasons. It details where it was found, if there was a clue to the
advantage, location and other advantage conditions. This maps to the
advantage_movement
table.
|>
advantage_details filter(season == 42)
#> # A tibble: 11 × 9
#> version version_season seaso…¹ season advan…² advan…³ clue_…⁴ locat…⁵ condi…⁶
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 US US42 Surviv… 42 USAM42… Amulet No clu… Found … Amulet…
#> 2 US US42 Surviv… 42 USAM42… Amulet No clu… Found … Amulet…
#> 3 US US42 Surviv… 42 USAM42… Amulet No clu… Found … Amulet…
#> 4 US US42 Surviv… 42 USEV42… Extra … No clu… Shipwh… <NA>
#> 5 US US42 Surviv… 42 USEV42… Extra … No clu… Shipwh… <NA>
#> 6 US US42 Surviv… 42 USHI42… Hidden… Found … Found … Beware…
#> 7 US US42 Surviv… 42 USHI42… Hidden… Found … Found … Beware…
#> 8 US US42 Surviv… 42 USHI42… Hidden… Found … Found … Beware…
#> 9 US US42 Surviv… 42 USKP42… Knowle… Found … Found … Knowle…
#> 10 US US42 Surviv… 42 USHI42… Hidden… Found … Found … Valid …
#> 11 US US42 Surviv… 42 USIN42… Idol n… Found … Found … <NA>
#> # … with abbreviated variable names ¹season_name, ²advantage_id,
#> # ³advantage_type, ⁴clue_details, ⁵location_found, ⁶conditions
The advantage_movement
table tracks who found the
advantage, who they may have handed it to and who the played it for.
Each step is called an event. The sequence_id
tracks the
logical step of the advantage. For example in season 41, JD found an
Extra Vote advantage. JD gave it to Shan in good faith who then voted
him out keeping the Extra Vote. Shan gave it to Ricard in good faith who
eventually gave it back before Shan played it for Naseer. That movement
is recorded in this table.
|>
advantage_movement filter(advantage_id == "USEV4102")
#> # A tibble: 5 × 15
#> version version…¹ seaso…² season casta…³ casta…⁴ advan…⁵ seque…⁶ day episode
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 US US41 Surviv… 41 JD US0603 USEV41… 1 2 1
#> 2 US US41 Surviv… 41 Shan US0606 USEV41… 2 9 4
#> 3 US US41 Surviv… 41 Ricard US0596 USEV41… 3 9 4
#> 4 US US41 Surviv… 41 Shan US0606 USEV41… 4 11 5
#> 5 US US41 Surviv… 41 Shan US0606 USEV41… 5 17 9
#> # … with 5 more variables: event <chr>, played_for <chr>, played_for_id <chr>,
#> # success <chr>, votes_nullified <dbl>, and abbreviated variable names
#> # ¹version_season, ²season_name, ³castaway, ⁴castaway_id, ⁵advantage_id,
#> # ⁶sequence_id
A dataset containing the number of confessionals for each castaway by season and episode. The data has been counted by contributors of the survivoR R package and consolidated with external sources. The aim is to establish consistency in confessional counts in the absence of official sources. Given the subjective nature of the counts and the potential for clerical error no single source is more valid than another. Therefore, it is reasonable to average across all sources.
|>
confessionals filter(season == 42) |>
group_by(castaway) |>
summarise(n_confessionals = sum(confessional_count))
#> # A tibble: 18 × 2
#> castaway n_confessionals
#> <chr> <dbl>
#> 1 Chanelle 18
#> 2 Daniel 15
#> 3 Drea 34
#> 4 Hai 37
#> 5 Jackson 2
#> 6 Jenny 6
#> 7 Jonathan 31
#> 8 Lindsay 45
#> 9 Lydia 14
#> 10 Marya 6
#> 11 Maryanne 43
#> 12 Mike 58
#> 13 Omar 41
#> 14 Rocksroy 21
#> 15 Romeo 33
#> 16 Swati 7
#> 17 Tori 18
#> 18 Zach 7
This dataset contains the estimated screen time for each castaway during an episode. Please note that this is still in the early days of development. There is likely to be misclassifcation and other sources of error. The model will be refined over time.
An individuals’ screen time is calculated, at a high-level, via the following process:
Frames are sampled from episodes on a 1 second time interval
MTCNN detects the human faces within each frame
VGGFace2 converts each detected face into a 512d vector space
A training set of labelled images (1 for each contestant + 3 for Jeff Probst) is processed in the same way to determine where they sit in the vector space. TODO: This could be made more accurate by increasing the number of training images per contestant.
The Euclidean distance is calculated for the faces detected in the frame to each of the contestants in the season (+Jeff). If the minimum distance is greater than 1.2 the face is labelled as “unknown”. TODO: Review how robust this distance cutoff truly is - currently based on manual review of Season 42.
A multi-class SVM is trained on the training set to label faces. For any face not identified as “unknown”, the vector embedding is run into this model and a label is generated.
All labelled faces are aggregated together, with an assumption of 1-5 full second of screen time each time a face is seen and factoring in time between detection capping at a max of 5 seconds.
|>
screen_time filter(version_season == "US42") |>
group_by(castaway_id) |>
summarise(total_mins = sum(screen_time)/60) |>
left_join(
|>
castaway_details select(castaway_id, castaway = short_name),
by = "castaway_id"
|>
) arrange(desc(total_mins))
#> Error in `select()`:
#> ! Can't subset columns that don't exist.
#> ✖ Column `short_name` doesn't exist.
Currently it only includes data for season 42. More seasons will be added as they are completed.
A mapping for castaways to tribes for each day (day being the day of the tribal council). This is useful for observing who is on what tribe throughout the game. Each season by day holds a complete list of castaways still in the game and which tribe they are on. Moving through each day you can observe the changes in the tribe. For example the first day (usual day 2) has all castaways mapped to their original tribe. The next day has the same minus the castaway just voted out. This is useful for observing the changes in tribe make either due to castaways being voted off the island, tribe swaps, who is on Redemption Island and Edge of Extinction.
|>
tribe_mapping filter(season == 42)
#> # A tibble: 177 × 10
#> version version_…¹ seaso…² season episode day casta…³ casta…⁴ tribe tribe…⁵
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 US US42 Surviv… 42 1 2 US0609 Chanel… Vati Origin…
#> 2 US US42 Surviv… 42 1 2 US0610 Daniel Vati Origin…
#> 3 US US42 Surviv… 42 1 2 US0611 Drea Ika Origin…
#> 4 US US42 Surviv… 42 1 2 US0612 Hai Vati Origin…
#> 5 US US42 Surviv… 42 1 2 US0613 Jackson Taku Origin…
#> 6 US US42 Surviv… 42 1 2 US0614 Jenny Vati Origin…
#> 7 US US42 Surviv… 42 1 2 US0615 Jonath… Taku Origin…
#> 8 US US42 Surviv… 42 1 2 US0616 Lindsay Taku Origin…
#> 9 US US42 Surviv… 42 1 2 US0617 Lydia Vati Origin…
#> 10 US US42 Surviv… 42 1 2 US0618 Marya Taku Origin…
#> # … with 167 more rows, and abbreviated variable names ¹version_season,
#> # ²season_name, ³castaway_id, ⁴castaway, ⁵tribe_status
A mapping table for easily filtering to the set of castaways that are still in the game after a specified number of boots. How this differs from the tribe mapping is that rather than being focused on an episode, it is focused on the boot which is often more useful. This is useful for filtering to who is still alive in the game for a given episode and number of boots. When someone quits the game or is medically evacuated it is considered a boot. This table tracks multiple boots per episode.
# filter to season 42 and when there are 6 people left
# 18 people in the season, therefore 12 boots
<- function(.version, .season, .n_boots) {
still_alive ::boot_mapping |>
survivoRfilter(
== .version,
version == .season,
season == .n_boots,
order %in% c("In the game", "Returned")
game_status
)
}
still_alive("US", 42, 12)
#> # A tibble: 6 × 11
#> version version_s…¹ seaso…² season episode order casta…³ casta…⁴ tribe tribe…⁵
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 US US42 Surviv… 42 12 12 US0615 Jonath… Kula… Merged
#> 2 US US42 Surviv… 42 12 12 US0616 Lindsay Kula… Merged
#> 3 US US42 Surviv… 42 12 12 US0619 Maryan… Kula… Merged
#> 4 US US42 Surviv… 42 12 12 US0620 Mike Kula… Merged
#> 5 US US42 Surviv… 42 12 12 US0621 Omar Kula… Merged
#> 6 US US42 Surviv… 42 12 12 US0623 Romeo Kula… Merged
#> # … with 1 more variable: game_status <chr>, and abbreviated variable names
#> # ¹version_season, ²season_name, ³castaway_id, ⁴castaway, ⁵tribe_status
A data frame containing the viewer information for every episode across all seasons. It also includes the rating and viewer share information for viewers aged 18 to 49 years of age.
|>
viewers filter(season == 42)
#> # A tibble: 13 × 12
#> # Groups: version [1]
#> version version_s…¹ seaso…² season episo…³ episode episo…⁴ episode_…⁵ episo…⁶
#> <chr> <chr> <chr> <dbl> <int> <dbl> <chr> <date> <dbl>
#> 1 US US42 Surviv… 42 611 1 Feels … 2022-03-09 86
#> 2 US US42 Surviv… 42 612 2 Good a… 2022-03-16 43
#> 3 US US42 Surviv… 42 613 3 Go for… 2022-03-23 43
#> 4 US US42 Surviv… 42 614 4 Vibe o… 2022-03-30 43
#> 5 US US42 Surviv… 42 615 5 I'm Su… 2022-04-06 43
#> 6 US US42 Surviv… 42 616 6 You Ca… 2022-04-13 43
#> 7 US US42 Surviv… 42 617 7 The De… 2022-04-13 43
#> 8 US US42 Surviv… 42 618 8 You Be… 2022-04-20 43
#> 9 US US42 Surviv… 42 619 9 Game o… 2022-04-27 43
#> 10 US US42 Surviv… 42 620 10 Tell a… 2022-05-04 43
#> 11 US US42 Surviv… 42 621 11 Battle… 2022-05-11 43
#> 12 US US42 Surviv… 42 622 12 Caterp… 2022-05-18 43
#> 13 US US42 Surviv… 42 623 13 It Com… 2022-05-25 129
#> # … with 3 more variables: viewers <dbl>, imdb_rating <dbl>, n_ratings <dbl>,
#> # and abbreviated variable names ¹version_season, ²season_name,
#> # ³episode_number_overall, ⁴episode_title, ⁵episode_date, ⁶episode_length
This data frame contains the tribe names and colours for each season, including the RGB values. These colours can be joined with the other data frames to customise colours for plots. Another option is to add tribal colours to ggplots with the scale functions.
tribe_colours#> # A tibble: 225 × 7
#> version version_season season_name season tribe tribe…¹ tribe…²
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 AU AU01 Survivor Australia: 2016 1 Agan… #FF0000 Origin…
#> 2 AU AU01 Survivor Australia: 2016 1 Saan… #0000FF Origin…
#> 3 AU AU01 Survivor Australia: 2016 1 Vavau #FFFF00 Origin…
#> 4 AU AU01 Survivor Australia: 2016 1 Fia … #000000 Merged
#> 5 AU AU02 Survivor Australia: 2017 2 Sama… #A51A84 Origin…
#> 6 AU AU02 Survivor Australia: 2017 2 Asaga #00A19C Origin…
#> 7 AU AU02 Survivor Australia: 2017 2 Asat… #000000 Merged
#> 8 AU AU03 Survivor Australia: Cham… 3 Cham… #0000FF Origin…
#> 9 AU AU03 Survivor Australia: Cham… 3 Cont… #FF0000 Origin…
#> 10 AU AU03 Survivor Australia: Cham… 3 Koro… #000000 Merged
#> # … with 215 more rows, and abbreviated variable names ¹tribe_colour,
#> # ²tribe_status
Included are ggplot2 scale functions of the form
scale_fill_survivor()
and scale_fill_tribes()
to add season and tribe colours to ggplot. The
scale_fill_survivor()
scales uses a colour palette
extracted from the season logo and scale_fill_tribes()
scales uses the tribal colours of the specified season as a colour
palette.
All that is required for the ‘survivor’ palettes is the desired season as input. If not season is provided it will default to season 40.
|>
castaways distinct(season, castaway_id) |>
left_join(
|>
castaway_details select(castaway_id, personality_type),
by = "castaway_id"
|>
) ggplot(aes(x = season, y = n, fill = personality_type)) +
geom_bar(stat = "identity") +
scale_fill_survivor(40) +
theme_minimal()
Below are the palettes for all seasons.
To use the tribe scales, simply input the season number desired to
use those tribe colours. If the fill or colour aesthetic is the tribe
name, this needs to be passed to the scale function as
scale_fill_tribes(season, tribe = tribe)
(for now) where
tribe
is on the input data frame. If the fill or colour
aesthetic is independent from the actual tribe names, like gender for
example, tribe
does not need to be specified and will
simply use the tribe colours as a colour palette, such as the viewers
line graph above.
<- 35
ssn <- castaways |>
labels filter(
== ssn,
season str_detect(result, "Sole|unner")
|>
) mutate(label = glue("{castaway} ({original_tribe})")) |>
select(label, castaway)
|>
jury_votes filter(season == ssn) |>
left_join(
|>
castaways filter(season == ssn) |>
select(castaway, original_tribe),
by = "castaway"
|>
) group_by(finalist, original_tribe) |>
summarise(votes = sum(vote)) |>
left_join(labels, by = c("finalist" = "castaway")) |>
{ggplot(., aes(x = label, y = votes, fill = original_tribe)) +
geom_bar(stat = "identity", width = 0.5) +
scale_fill_tribes(ssn, tribe = .$original_tribe) +
theme_minimal() +
labs(
x = "Finalist (original tribe)",
y = "Votes",
fill = "Original\ntribe",
title = "Votes received by each finalist"
) }
Given the variable nature of the game of Survivor and changing of the rules, there are bound to be edges cases where the data is not quite right. Before logging an issue please install the git version to see if it has already been corrected. If not, please log an issue and I will correct the datasets.
New features will be added, such as details on exiled castaways across the seasons. If you have a request for specific data let me know in the issues and I’ll see what I can do. Also, if you’d like to contribute by adding to existing datasets or contribute a new dataset, please contact me directly.
Carly Levitz has developed a fantastic dashboard showcasing the data and allowing you to drill down into seasons, castaways, voting history and challenges.
This looks at the number of immunity idols won and votes received for each winner.
A big thank you to:
Data was sourced from Wikipedia and the Survivor Wiki. Other data, such as the tribe colours, was manually recorded and entered by myself and contributors.
Hex graphic by CBS