survivoR

950 episodes. 959 people. 1 package!

survivoR is a collection of data sets detailing events across 60 seasons of Survivor US, Survivor Australia, Survivor South Africa and Survivor New Zealand. It includes castaway information, vote history, immunity and reward challenge winners, jury votes, advantage details and heaps more!

Installation

Now on CRAN (v2.0.4) or Git (v2.0.7).

If Git > CRAN I’d suggest install from Git. We are constantly improving the data sets so the github version is likely to be slightly improved.

install.packages("survivoR")
devtools::install_github("doehm/survivoR")

News: survivoR 2.0.4

Confessionals

    Confessional tables

Confessional counts from myself, Carly Levitz, Sam, Grace and others

Dataset overview

Season summary

A table containing summary details of each season of Survivor, including the winner, runner ups and location.

season_summary
#> # A tibble: 60 × 22
#>    version versi…¹ seaso…² season locat…³ country tribe…⁴ full_…⁵ winne…⁶ winner
#>    <chr>   <chr>   <chr>    <dbl> <chr>   <chr>   <chr>   <chr>   <chr>   <chr> 
#>  1 AU      AU01    Surviv…      1 Upolu   Samoa   "The 2… Kristi… AU0024  Krist…
#>  2 AU      AU02    Surviv…      2 Upolu   Samoa   "The 2… Jerich… AU0048  Jeric…
#>  3 AU      AU03    Surviv…      3 Savusa… Fiji    "The 2… Shane … AU0071  Shane 
#>  4 AU      AU04    Surviv…      4 Savusa… Fiji    "Two t… Pia Mi… AU0094  Pia   
#>  5 AU      AU05    Surviv…      5 Savusa… Fiji    "Two t… David … AU0086  David 
#>  6 AU      AU06    Surviv…      6 Cloncu… Austra… "The 2… Hayley… AU0119  Hayley
#>  7 AU      AU07    Surviv…      7 Charte… Austra… "Blood… Mark W… AU0031  Mark  
#>  8 NZ      NZ01    Surviv…      1 San Ju… Nicara… "Two t… Avi Du… NZ0016  Avi   
#>  9 SA      SA01    Surviv…      1 Pearl … Panama  "The 1… Vaness… SA0010  Vanes…
#> 10 SA      SA02    Surviv…      2 Johor   Malays… "Two t… Lorett… SA0030  Loret…
#> # … with 50 more rows, 12 more variables: runner_ups <chr>, final_vote <chr>,
#> #   timeslot <chr>, premiered <date>, ended <date>, filming_started <date>,
#> #   filming_ended <date>, viewers_premiere <dbl>, viewers_finale <dbl>,
#> #   viewers_reunion <dbl>, viewers_mean <dbl>, rank <dbl>, and abbreviated
#> #   variable names ¹​version_season, ²​season_name, ³​location, ⁴​tribe_setup,
#> #   ⁵​full_name, ⁶​winner_id

Castaways

This data set contains season and demographic information about each castaway. It is structured to view their results for each season. Castaways that have played in multiple seasons will feature more than once with the age and location representing that point in time. Castaways that re-entered the game will feature more than once in the same season as they technically have more than one boot order e.g. Natalie Anderson - Winners at War.

Each castaway has a unique castaway_id which links the individual across all data sets and seasons. It also links to the following ID’s found on the vote_history, jury_votes and challenges data sets.

castaways |> 
  filter(season == 42)
#> # A tibble: 18 × 16
#>    version version_se…¹ seaso…² season full_…³ casta…⁴ casta…⁵   age city  state
#>    <chr>   <chr>        <chr>    <dbl> <chr>   <chr>   <chr>   <dbl> <chr> <chr>
#>  1 US      US42         Surviv…     42 Jackso… US0613  Jackson    47 Hous… Texas
#>  2 US      US42         Surviv…     42 Zach W… US0626  Zach       21 St. … Miss…
#>  3 US      US42         Surviv…     42 Marya … US0618  Marya      47 Nobl… Indi…
#>  4 US      US42         Surviv…     42 Jenny … US0614  Jenny      43 Broo… New …
#>  5 US      US42         Surviv…     42 Swati … US0624  Swati      19 Palo… Cali…
#>  6 US      US42         Surviv…     42 Daniel… US0610  Daniel     30 New … Conn…
#>  7 US      US42         Surviv…     42 Lydia … US0617  Lydia      22 Sant… Cali…
#>  8 US      US42         Surviv…     42 Chanel… US0609  Chanel…    28 New … New …
#>  9 US      US42         Surviv…     42 Rocksr… US0622  Rocksr…    43 Las … Neva…
#> 10 US      US42         Surviv…     42 Tori M… US0625  Tori       24 Roge… Ariz…
#> 11 US      US42         Surviv…     42 Hai Gi… US0612  Hai        29 New … Loui…
#> 12 US      US42         Surviv…     42 Drea W… US0611  Drea       34 Mont… Queb…
#> 13 US      US42         Surviv…     42 Omar Z… US0621  Omar       31 Whit… Onta…
#> 14 US      US42         Surviv…     42 Lindsa… US0616  Lindsay    30 Asbu… New …
#> 15 US      US42         Surviv…     42 Jonath… US0615  Jonath…    28 Gulf… Alab…
#> 16 US      US42         Surviv…     42 Romeo … US0623  Romeo      37 Norw… Cali…
#> 17 US      US42         Surviv…     42 Mike T… US0620  Mike       57 Hobo… New …
#> 18 US      US42         Surviv…     42 Maryan… US0619  Maryan…    24 Ajax  Onta…
#> # … with 6 more variables: episode <dbl>, day <dbl>, order <dbl>, result <chr>,
#> #   jury_status <chr>, original_tribe <chr>, and abbreviated variable names
#> #   ¹​version_season, ²​season_name, ³​full_name, ⁴​castaway_id, ⁵​castaway

Castaway details

A few castaways have changed their name from season to season or have been referred to by a different name during the season e.g. Amber Mariano; in season 8 Survivor All-Stars there was Rob C and Rob M. That information has been retained here in the castaways data set.

castaway_details contains unique information for each castaway. It takes the full name from their most current season and their most verbose short name which is handy for labelling.

It also includes gender, date of birth, occupation, race and ethnicity data. If no source was found to determine a castaways race and ethnicity, the data is kept as missing rather than making an assumption.

castaway_details
#> # A tibble: 959 × 11
#>    castaway_id full_n…¹ casta…² date_of_…³ date_of_…⁴ gender race  ethni…⁵ poc  
#>    <chr>       <chr>    <chr>   <date>     <date>     <chr>  <chr> <chr>   <chr>
#>  1 AU0001      Des Qui… Des     NA         NA         Male   <NA>  <NA>    <NA> 
#>  2 AU0002      Bianca … Bianca  NA         NA         Female <NA>  <NA>    <NA> 
#>  3 AU0003      Evan Jo… Evan    NA         NA         Male   <NA>  <NA>    <NA> 
#>  4 AU0004      Peter F… Peter   NA         NA         Male   <NA>  <NA>    <NA> 
#>  5 AU0005      Barry L… Barry   NA         NA         Male   <NA>  Aborig… <NA> 
#>  6 AU0006      Tegan H… Tegan   NA         NA         Female <NA>  <NA>    <NA> 
#>  7 AU0007      Rohan M… Rohan   NA         NA         Male   <NA>  <NA>    <NA> 
#>  8 AU0008      Kat Dum… Katinka 1989-09-21 NA         Female <NA>  <NA>    <NA> 
#>  9 AU0009      Andrew … Andrew  NA         NA         Male   <NA>  <NA>    <NA> 
#> 10 AU0010      Craig I… Craig   NA         NA         Male   <NA>  <NA>    <NA> 
#> # … with 949 more rows, 2 more variables: occupation <chr>,
#> #   personality_type <chr>, and abbreviated variable names ¹​full_name,
#> #   ²​castaway, ³​date_of_birth, ⁴​date_of_death, ⁵​ethnicity

Vote history

This data frame contains a complete history of votes cast across all seasons of Survivor. This allows you to see who who voted for who at which Tribal Council. It also includes details on who had individual immunity as well as who had their votes nullified by a hidden immunity idol. This details the key events for the season.

vh <- vote_history |> 
  filter(
    season == 42,
    episode == 9
  ) 
vh
#> # A tibble: 10 × 22
#>    version version_…¹ seaso…² season episode   day tribe…³ tribe casta…⁴ immun…⁵
#>    <chr>   <chr>      <chr>    <dbl>   <dbl> <dbl> <chr>   <chr> <chr>   <chr>  
#>  1 US      US42       Surviv…     42       9    17 Merged  Kula… Hai     Indivi…
#>  2 US      US42       Surviv…     42       9    17 Merged  Kula… Mike    <NA>   
#>  3 US      US42       Surviv…     42       9    17 Merged  Kula… Omar    <NA>   
#>  4 US      US42       Surviv…     42       9    17 Merged  Kula… Rocksr… <NA>   
#>  5 US      US42       Surviv…     42       9    17 Merged  Kula… Romeo   <NA>   
#>  6 US      US42       Surviv…     42       9    17 Merged  Kula… Drea    Hidden 
#>  7 US      US42       Surviv…     42       9    17 Merged  Kula… Jonath… Indivi…
#>  8 US      US42       Surviv…     42       9    17 Merged  Kula… Lindsay <NA>   
#>  9 US      US42       Surviv…     42       9    17 Merged  Kula… Maryan… Hidden 
#> 10 US      US42       Surviv…     42       9    17 Merged  Kula… Tori    <NA>   
#> # … with 12 more variables: vote <chr>, vote_event <chr>,
#> #   vote_event_outcome <chr>, split_vote <chr>, nullified <lgl>, tie <lgl>,
#> #   voted_out <chr>, order <dbl>, vote_order <dbl>, castaway_id <chr>,
#> #   vote_id <chr>, voted_out_id <chr>, and abbreviated variable names
#> #   ¹​version_season, ²​season_name, ³​tribe_status, ⁴​castaway, ⁵​immunity
vh |> 
  count(vote)
#> # A tibble: 4 × 2
#>   vote         n
#>   <chr>    <int>
#> 1 Rocksroy     4
#> 2 Romeo        1
#> 3 Tori         4
#> 4 <NA>         1

Challenges

Note: From v1.1 the challenge_results dataset has been improved but could break existing code. The old table is maintained at challenge_results_dep

There are two tables challenge_results and challenge_description.

Challenge results

A tidy data frame of immunity and reward challenge results. The winners and losers of the challenges are found recorded here.

challenge_results |> 
  filter(season == 42) |> 
  group_by(castaway) |> 
  summarise(
    won = sum(result == "Won"),
    Lost = sum(result == "Lost"),
    total_challenges = n(),
    chose_for_reward = sum(chosen_for_reward)
  )
#> # A tibble: 18 × 5
#>    castaway   won  Lost total_challenges chose_for_reward
#>    <chr>    <int> <int>            <int>            <int>
#>  1 Chanelle     4     7               11                0
#>  2 Daniel       3     4                7                0
#>  3 Drea         5    11               16                0
#>  4 Hai          5    10               15                0
#>  5 Jackson      0     1                1                0
#>  6 Jenny        2     2                4                0
#>  7 Jonathan    10    10               20                1
#>  8 Lindsay      9    10               19                1
#>  9 Lydia        4     5                9                0
#> 10 Marya        1     2                3                0
#> 11 Maryanne     7    13               20                1
#> 12 Mike         5    15               20                2
#> 13 Omar         6    12               18                1
#> 14 Rocksroy     5     8               13                0
#> 15 Romeo        5    15               20                1
#> 16 Swati        3     3                6                0
#> 17 Tori         9     4               13                0
#> 18 Zach         1     1                2                0

The challenge_id is the primary key for the challenge_description data set. The challange_id will change as the data or descriptions change.

TODO: Each challenge must have an ID and link to challenge description

Challenge description

This data set contains descriptive binary fields for each challenge. Challenges can go by different names but where possible recurring challenges are kept consistent. While there are tweaks to the challenges, where the main components of the challenge is consistent, they share the same name.

The features of each challenge have been determined largely through string searches of key words that describe the challenge. It may not capture the full essence of the challenge but on the whole will provide a good basis for analysis. Since the description is simply a short paragraph or sentence it may not flag all appropriate features. If any descriptive features need altering please let me know in the issues.

Features:

challenge_description
#> # A tibble: 1,024 × 14
#>    challeng…¹ chall…² puzzle race  preci…³ endur…⁴ stren…⁵ turn_…⁶ balance food 
#>    <chr>      <chr>   <lgl>  <lgl> <lgl>   <lgl>   <lgl>   <lgl>   <lgl>   <lgl>
#>  1 CC0053     Barrel… FALSE  TRUE  TRUE    FALSE   FALSE   FALSE   FALSE   FALSE
#>  2 CC0079     Blue L… TRUE   TRUE  TRUE    FALSE   FALSE   FALSE   FALSE   FALSE
#>  3 CC0114     By the… FALSE  TRUE  FALSE   FALSE   FALSE   FALSE   TRUE    FALSE
#>  4 CC0138     Choose… FALSE  TRUE  TRUE    FALSE   FALSE   TRUE    FALSE   FALSE
#>  5 CC0232     Flashb… FALSE  FALSE FALSE   TRUE    FALSE   FALSE   FALSE   FALSE
#>  6 CC0305     Home S… TRUE   TRUE  TRUE    FALSE   FALSE   FALSE   TRUE    FALSE
#>  7 CC0334     Kenny … TRUE   TRUE  TRUE    FALSE   FALSE   FALSE   FALSE   FALSE
#>  8 CC0358     Log Jam FALSE  TRUE  FALSE   TRUE    FALSE   TRUE    FALSE   FALSE
#>  9 CC0371     Maroon… FALSE  TRUE  FALSE   FALSE   FALSE   TRUE    FALSE   FALSE
#> 10 CC0408     O-Blac… FALSE  TRUE  TRUE    FALSE   FALSE   FALSE   FALSE   FALSE
#> # … with 1,014 more rows, 4 more variables: knowledge <lgl>, memory <lgl>,
#> #   fire <lgl>, water <lgl>, and abbreviated variable names ¹​challenge_id,
#> #   ²​challenge_name, ³​precision, ⁴​endurance, ⁵​strength, ⁶​turn_based

challenge_description |> 
  summarise_if(is_logical, sum)
#> # A tibble: 1 × 12
#>   puzzle  race precision endurance strength turn_…¹ balance  food knowl…² memory
#>    <int> <int>     <int>     <int>    <int>   <int>   <int> <int>   <int>  <int>
#> 1    277   810       224       148      109     155     185    24      56     26
#> # … with 2 more variables: fire <int>, water <int>, and abbreviated variable
#> #   names ¹​turn_based, ²​knowledge

Jury votes

History of jury votes. It is more verbose than it needs to be, however having a 0-1 column indicating if a vote was placed or not makes it easier to summarise castaways that received no votes.

jury_votes |> 
  filter(season == 42)
#> # A tibble: 24 × 9
#>    version version_season season_…¹ season casta…² final…³  vote casta…⁴ final…⁵
#>    <chr>   <chr>          <chr>      <dbl> <chr>   <chr>   <dbl> <chr>   <chr>  
#>  1 US      US42           Survivor…     42 Jonath… Romeo       0 US0615  US0623 
#>  2 US      US42           Survivor…     42 Lindsay Romeo       0 US0616  US0623 
#>  3 US      US42           Survivor…     42 Omar    Romeo       0 US0621  US0623 
#>  4 US      US42           Survivor…     42 Drea    Romeo       0 US0611  US0623 
#>  5 US      US42           Survivor…     42 Hai     Romeo       0 US0612  US0623 
#>  6 US      US42           Survivor…     42 Tori    Romeo       0 US0625  US0623 
#>  7 US      US42           Survivor…     42 Rocksr… Romeo       0 US0622  US0623 
#>  8 US      US42           Survivor…     42 Chanel… Romeo       0 US0609  US0623 
#>  9 US      US42           Survivor…     42 Jonath… Mike        1 US0615  US0620 
#> 10 US      US42           Survivor…     42 Lindsay Mike        0 US0616  US0620 
#> # … with 14 more rows, and abbreviated variable names ¹​season_name, ²​castaway,
#> #   ³​finalist, ⁴​castaway_id, ⁵​finalist_id
jury_votes |> 
  filter(season == 42) |> 
  group_by(finalist) |> 
  summarise(votes = sum(vote))
#> # A tibble: 3 × 2
#>   finalist votes
#>   <chr>    <dbl>
#> 1 Maryanne     7
#> 2 Mike         1
#> 3 Romeo        0

Advantages

Advantage Details

This dataset lists the hidden idols and advantages in the game for all seasons. It details where it was found, if there was a clue to the advantage, location and other advantage conditions. This maps to the advantage_movement table.

advantage_details |> 
  filter(season == 42)
#> # A tibble: 11 × 9
#>    version version_season seaso…¹ season advan…² advan…³ clue_…⁴ locat…⁵ condi…⁶
#>    <chr>   <chr>          <chr>    <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 US      US42           Surviv…     42 USAM42… Amulet  No clu… Found … Amulet…
#>  2 US      US42           Surviv…     42 USAM42… Amulet  No clu… Found … Amulet…
#>  3 US      US42           Surviv…     42 USAM42… Amulet  No clu… Found … Amulet…
#>  4 US      US42           Surviv…     42 USEV42… Extra … No clu… Shipwh… <NA>   
#>  5 US      US42           Surviv…     42 USEV42… Extra … No clu… Shipwh… <NA>   
#>  6 US      US42           Surviv…     42 USHI42… Hidden… Found … Found … Beware…
#>  7 US      US42           Surviv…     42 USHI42… Hidden… Found … Found … Beware…
#>  8 US      US42           Surviv…     42 USHI42… Hidden… Found … Found … Beware…
#>  9 US      US42           Surviv…     42 USKP42… Knowle… Found … Found … Knowle…
#> 10 US      US42           Surviv…     42 USHI42… Hidden… Found … Found … Valid …
#> 11 US      US42           Surviv…     42 USIN42… Idol n… Found … Found … <NA>   
#> # … with abbreviated variable names ¹​season_name, ²​advantage_id,
#> #   ³​advantage_type, ⁴​clue_details, ⁵​location_found, ⁶​conditions

Advantage Movement

The advantage_movement table tracks who found the advantage, who they may have handed it to and who the played it for. Each step is called an event. The sequence_id tracks the logical step of the advantage. For example in season 41, JD found an Extra Vote advantage. JD gave it to Shan in good faith who then voted him out keeping the Extra Vote. Shan gave it to Ricard in good faith who eventually gave it back before Shan played it for Naseer. That movement is recorded in this table.

advantage_movement |> 
  filter(advantage_id == "USEV4102")
#> # A tibble: 5 × 15
#>   version version…¹ seaso…² season casta…³ casta…⁴ advan…⁵ seque…⁶   day episode
#>   <chr>   <chr>     <chr>    <dbl> <chr>   <chr>   <chr>     <dbl> <dbl>   <dbl>
#> 1 US      US41      Surviv…     41 JD      US0603  USEV41…       1     2       1
#> 2 US      US41      Surviv…     41 Shan    US0606  USEV41…       2     9       4
#> 3 US      US41      Surviv…     41 Ricard  US0596  USEV41…       3     9       4
#> 4 US      US41      Surviv…     41 Shan    US0606  USEV41…       4    11       5
#> 5 US      US41      Surviv…     41 Shan    US0606  USEV41…       5    17       9
#> # … with 5 more variables: event <chr>, played_for <chr>, played_for_id <chr>,
#> #   success <chr>, votes_nullified <dbl>, and abbreviated variable names
#> #   ¹​version_season, ²​season_name, ³​castaway, ⁴​castaway_id, ⁵​advantage_id,
#> #   ⁶​sequence_id

Confessionals

A dataset containing the number of confessionals for each castaway by season and episode. The data has been counted by contributors of the survivoR R package and consolidated with external sources. The aim is to establish consistency in confessional counts in the absence of official sources. Given the subjective nature of the counts and the potential for clerical error no single source is more valid than another. Therefore, it is reasonable to average across all sources.

confessionals |> 
  filter(season == 42) |> 
  group_by(castaway) |> 
  summarise(n_confessionals = sum(confessional_count))
#> # A tibble: 18 × 2
#>    castaway n_confessionals
#>    <chr>              <dbl>
#>  1 Chanelle              18
#>  2 Daniel                15
#>  3 Drea                  34
#>  4 Hai                   37
#>  5 Jackson                2
#>  6 Jenny                  6
#>  7 Jonathan              31
#>  8 Lindsay               45
#>  9 Lydia                 14
#> 10 Marya                  6
#> 11 Maryanne              43
#> 12 Mike                  58
#> 13 Omar                  41
#> 14 Rocksroy              21
#> 15 Romeo                 33
#> 16 Swati                  7
#> 17 Tori                  18
#> 18 Zach                   7

Screen time [EXPERIMENTAL]

This dataset contains the estimated screen time for each castaway during an episode. Please note that this is still in the early days of development. There is likely to be misclassifcation and other sources of error. The model will be refined over time.

An individuals’ screen time is calculated, at a high-level, via the following process:

  1. Frames are sampled from episodes on a 1 second time interval

  2. MTCNN detects the human faces within each frame

  3. VGGFace2 converts each detected face into a 512d vector space

  4. A training set of labelled images (1 for each contestant + 3 for Jeff Probst) is processed in the same way to determine where they sit in the vector space. TODO: This could be made more accurate by increasing the number of training images per contestant.

  5. The Euclidean distance is calculated for the faces detected in the frame to each of the contestants in the season (+Jeff). If the minimum distance is greater than 1.2 the face is labelled as “unknown”. TODO: Review how robust this distance cutoff truly is - currently based on manual review of Season 42.

  6. A multi-class SVM is trained on the training set to label faces. For any face not identified as “unknown”, the vector embedding is run into this model and a label is generated.

  7. All labelled faces are aggregated together, with an assumption of 1-5 full second of screen time each time a face is seen and factoring in time between detection capping at a max of 5 seconds.

screen_time |> 
  filter(version_season == "US42") |> 
  group_by(castaway_id) |> 
  summarise(total_mins = sum(screen_time)/60) |> 
  left_join(
    castaway_details |> 
      select(castaway_id, castaway = short_name),
    by = "castaway_id"
  ) |> 
  arrange(desc(total_mins))
#> Error in `select()`:
#> ! Can't subset columns that don't exist.
#> ✖ Column `short_name` doesn't exist.

Currently it only includes data for season 42. More seasons will be added as they are completed.

Tribe mapping

A mapping for castaways to tribes for each day (day being the day of the tribal council). This is useful for observing who is on what tribe throughout the game. Each season by day holds a complete list of castaways still in the game and which tribe they are on. Moving through each day you can observe the changes in the tribe. For example the first day (usual day 2) has all castaways mapped to their original tribe. The next day has the same minus the castaway just voted out. This is useful for observing the changes in tribe make either due to castaways being voted off the island, tribe swaps, who is on Redemption Island and Edge of Extinction.

tribe_mapping |> 
  filter(season == 42)
#> # A tibble: 177 × 10
#>    version version_…¹ seaso…² season episode   day casta…³ casta…⁴ tribe tribe…⁵
#>    <chr>   <chr>      <chr>    <dbl>   <dbl> <dbl> <chr>   <chr>   <chr> <chr>  
#>  1 US      US42       Surviv…     42       1     2 US0609  Chanel… Vati  Origin…
#>  2 US      US42       Surviv…     42       1     2 US0610  Daniel  Vati  Origin…
#>  3 US      US42       Surviv…     42       1     2 US0611  Drea    Ika   Origin…
#>  4 US      US42       Surviv…     42       1     2 US0612  Hai     Vati  Origin…
#>  5 US      US42       Surviv…     42       1     2 US0613  Jackson Taku  Origin…
#>  6 US      US42       Surviv…     42       1     2 US0614  Jenny   Vati  Origin…
#>  7 US      US42       Surviv…     42       1     2 US0615  Jonath… Taku  Origin…
#>  8 US      US42       Surviv…     42       1     2 US0616  Lindsay Taku  Origin…
#>  9 US      US42       Surviv…     42       1     2 US0617  Lydia   Vati  Origin…
#> 10 US      US42       Surviv…     42       1     2 US0618  Marya   Taku  Origin…
#> # … with 167 more rows, and abbreviated variable names ¹​version_season,
#> #   ²​season_name, ³​castaway_id, ⁴​castaway, ⁵​tribe_status

Boot Mapping

A mapping table for easily filtering to the set of castaways that are still in the game after a specified number of boots. How this differs from the tribe mapping is that rather than being focused on an episode, it is focused on the boot which is often more useful. This is useful for filtering to who is still alive in the game for a given episode and number of boots. When someone quits the game or is medically evacuated it is considered a boot. This table tracks multiple boots per episode.

# filter to season 42 and when there are 6 people left
# 18 people in the season, therefore 12 boots

still_alive <- function(.version, .season, .n_boots) {
  survivoR::boot_mapping |>
    filter(
      version == .version,
      season == .season,
      order == .n_boots,
      game_status %in% c("In the game", "Returned")
    )
}

still_alive("US", 42, 12)
#> # A tibble: 6 × 11
#>   version version_s…¹ seaso…² season episode order casta…³ casta…⁴ tribe tribe…⁵
#>   <chr>   <chr>       <chr>    <dbl>   <dbl> <dbl> <chr>   <chr>   <chr> <chr>  
#> 1 US      US42        Surviv…     42      12    12 US0615  Jonath… Kula… Merged 
#> 2 US      US42        Surviv…     42      12    12 US0616  Lindsay Kula… Merged 
#> 3 US      US42        Surviv…     42      12    12 US0619  Maryan… Kula… Merged 
#> 4 US      US42        Surviv…     42      12    12 US0620  Mike    Kula… Merged 
#> 5 US      US42        Surviv…     42      12    12 US0621  Omar    Kula… Merged 
#> 6 US      US42        Surviv…     42      12    12 US0623  Romeo   Kula… Merged 
#> # … with 1 more variable: game_status <chr>, and abbreviated variable names
#> #   ¹​version_season, ²​season_name, ³​castaway_id, ⁴​castaway, ⁵​tribe_status

Viewers

A data frame containing the viewer information for every episode across all seasons. It also includes the rating and viewer share information for viewers aged 18 to 49 years of age.

viewers |> 
  filter(season == 42)
#> # A tibble: 13 × 12
#> # Groups:   version [1]
#>    version version_s…¹ seaso…² season episo…³ episode episo…⁴ episode_…⁵ episo…⁶
#>    <chr>   <chr>       <chr>    <dbl>   <int>   <dbl> <chr>   <date>       <dbl>
#>  1 US      US42        Surviv…     42     611       1 Feels … 2022-03-09      86
#>  2 US      US42        Surviv…     42     612       2 Good a… 2022-03-16      43
#>  3 US      US42        Surviv…     42     613       3 Go for… 2022-03-23      43
#>  4 US      US42        Surviv…     42     614       4 Vibe o… 2022-03-30      43
#>  5 US      US42        Surviv…     42     615       5 I'm Su… 2022-04-06      43
#>  6 US      US42        Surviv…     42     616       6 You Ca… 2022-04-13      43
#>  7 US      US42        Surviv…     42     617       7 The De… 2022-04-13      43
#>  8 US      US42        Surviv…     42     618       8 You Be… 2022-04-20      43
#>  9 US      US42        Surviv…     42     619       9 Game o… 2022-04-27      43
#> 10 US      US42        Surviv…     42     620      10 Tell a… 2022-05-04      43
#> 11 US      US42        Surviv…     42     621      11 Battle… 2022-05-11      43
#> 12 US      US42        Surviv…     42     622      12 Caterp… 2022-05-18      43
#> 13 US      US42        Surviv…     42     623      13 It Com… 2022-05-25     129
#> # … with 3 more variables: viewers <dbl>, imdb_rating <dbl>, n_ratings <dbl>,
#> #   and abbreviated variable names ¹​version_season, ²​season_name,
#> #   ³​episode_number_overall, ⁴​episode_title, ⁵​episode_date, ⁶​episode_length

Tribe colours

This data frame contains the tribe names and colours for each season, including the RGB values. These colours can be joined with the other data frames to customise colours for plots. Another option is to add tribal colours to ggplots with the scale functions.

tribe_colours
#> # A tibble: 225 × 7
#>    version version_season season_name               season tribe tribe…¹ tribe…²
#>    <chr>   <chr>          <chr>                      <dbl> <chr> <chr>   <chr>  
#>  1 AU      AU01           Survivor Australia: 2016       1 Agan… #FF0000 Origin…
#>  2 AU      AU01           Survivor Australia: 2016       1 Saan… #0000FF Origin…
#>  3 AU      AU01           Survivor Australia: 2016       1 Vavau #FFFF00 Origin…
#>  4 AU      AU01           Survivor Australia: 2016       1 Fia … #000000 Merged 
#>  5 AU      AU02           Survivor Australia: 2017       2 Sama… #A51A84 Origin…
#>  6 AU      AU02           Survivor Australia: 2017       2 Asaga #00A19C Origin…
#>  7 AU      AU02           Survivor Australia: 2017       2 Asat… #000000 Merged 
#>  8 AU      AU03           Survivor Australia: Cham…      3 Cham… #0000FF Origin…
#>  9 AU      AU03           Survivor Australia: Cham…      3 Cont… #FF0000 Origin…
#> 10 AU      AU03           Survivor Australia: Cham…      3 Koro… #000000 Merged 
#> # … with 215 more rows, and abbreviated variable names ¹​tribe_colour,
#> #   ²​tribe_status

Scale functions

Included are ggplot2 scale functions of the form scale_fill_survivor() and scale_fill_tribes() to add season and tribe colours to ggplot. The scale_fill_survivor() scales uses a colour palette extracted from the season logo and scale_fill_tribes() scales uses the tribal colours of the specified season as a colour palette.

All that is required for the ‘survivor’ palettes is the desired season as input. If not season is provided it will default to season 40.

castaways |> 
  distinct(season, castaway_id) |> 
  left_join(
    castaway_details |> 
      select(castaway_id, personality_type),
    by = "castaway_id"
  ) |> 
  ggplot(aes(x = season, y = n, fill = personality_type)) +
  geom_bar(stat = "identity") +
  scale_fill_survivor(40) +
  theme_minimal()

Below are the palettes for all seasons.

To use the tribe scales, simply input the season number desired to use those tribe colours. If the fill or colour aesthetic is the tribe name, this needs to be passed to the scale function as scale_fill_tribes(season, tribe = tribe) (for now) where tribe is on the input data frame. If the fill or colour aesthetic is independent from the actual tribe names, like gender for example, tribe does not need to be specified and will simply use the tribe colours as a colour palette, such as the viewers line graph above.

ssn <- 35
labels <- castaways |>
  filter(
    season == ssn,
    str_detect(result, "Sole|unner")
  ) |>
  mutate(label = glue("{castaway} ({original_tribe})")) |>
  select(label, castaway)

jury_votes |>
  filter(season == ssn) |>
  left_join(
    castaways |>
      filter(season == ssn) |>
      select(castaway, original_tribe),
    by = "castaway"
  ) |>
  group_by(finalist, original_tribe) |>
  summarise(votes = sum(vote)) |>
  left_join(labels, by = c("finalist" = "castaway")) |>
  {
    ggplot(., aes(x = label, y = votes, fill = original_tribe)) +
      geom_bar(stat = "identity", width = 0.5) +
      scale_fill_tribes(ssn, tribe = .$original_tribe) +
      theme_minimal() +
      labs(
        x = "Finalist (original tribe)",
        y = "Votes",
        fill = "Original\ntribe",
        title = "Votes received by each finalist"
      )
  }

Issues

Given the variable nature of the game of Survivor and changing of the rules, there are bound to be edges cases where the data is not quite right. Before logging an issue please install the git version to see if it has already been corrected. If not, please log an issue and I will correct the datasets.

New features will be added, such as details on exiled castaways across the seasons. If you have a request for specific data let me know in the issues and I’ll see what I can do. Also, if you’d like to contribute by adding to existing datasets or contribute a new dataset, please contact me directly.

Showcase

Survivor Dashboard

Carly Levitz has developed a fantastic dashboard showcasing the data and allowing you to drill down into seasons, castaways, voting history and challenges.

Data viz

This looks at the number of immunity idols won and votes received for each winner.

Contributors

A big thank you to:

Package contributor and maintainers

Data contributors

References

Data was sourced from Wikipedia and the Survivor Wiki. Other data, such as the tribe colours, was manually recorded and entered by myself and contributors.

Hex graphic by CBS