Introduction

About

The Assessment, Total Maximum Daily Load (TMDL) Tracking and Implementation System (ATTAINS) is the U.S. Environmental Protection Agency (EPA) database used to track information provided by states about water quality assessments conducted under the Clean Water Act. The assessments are conducted every two years to evaluate if the nation’s water bodies meet water quality standards. States are required to take Actions (TMDLs or other efforts) on water bodies that do not meet standards. Public information in ATTAINS is made available through webservices and provided as JSON files. rATTAINS facilitates accessing this data with various functions that provide raw JSON or formatted “tidy” data for each of the ATTAINS webservice endpoints. More information about Clean Water Act assessment and reporting is available through the EPA. For alternative methods of accessing the same data, see “How’s My Waterway” webpage for interactive data exploration or the ArcGIS MapService for spatial data.

Functions

Summary Services

The EPA provides two summary service endpoint that provide summaries of assessed uses by the organization identifier or by hydrologic unit code (HUC). For example, to return a summary of assessed uses by the state of Tennessee the following function is used:

library(rATTAINS)
x <- state_summary(organization_id = "TDECWR",
                   reporting_cycle = "2016")
x
#> # A tibble: 22 × 13
#>    organizatio…¹ organ…² organ…³ repor…⁴ combi…⁵ water…⁶ units…⁷ use_n…⁸ fully…⁹
#>    <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Fish a… 531027…
#>  2 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Recrea… 376477…
#>  3 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Indust… 447537…
#>  4 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Naviga… 1971   
#>  5 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Domest… 521638…
#>  6 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Irriga… 563584…
#>  7 TDECWR        Tennes… State   2016    <NA>    LAKE/R… Acres   Livest… 563584…
#>  8 TDECWR        Tennes… State   2016    <NA>    WETLAN… Acres   Fish a… <NA>   
#>  9 TDECWR        Tennes… State   2016    <NA>    WETLAN… Acres   Livest… <NA>   
#> 10 TDECWR        Tennes… State   2016    <NA>    WETLAN… Acres   Irriga… <NA>   
#> # … with 12 more rows, 4 more variables: fully_supporting_count <chr>,
#> #   not_assessed <chr>, not_assessed_count <chr>, parameters <list>, and
#> #   abbreviated variable names ¹​organization_identifier, ²​organization_name,
#> #   ³​organization_type_text, ⁴​reporting_cycle, ⁵​combined_cycles,
#> #   ⁶​water_type_code, ⁷​units_code, ⁸​use_name, ⁹​fully_supporting

The resulting tibble includes the water type, designated use, summary of the how much of the assessed uses meet criteria (by count, area, distance, etc.) or are not assessed. For each row, there is a variable called “parameters” composed of a nested tibble that provides further information about the use assessment by parameters assessed:

x$parameters[[1]]
#> # A tibble: 9 × 7
#>   parameter_group                  cause cause…¹ meeti…² meeti…³ insuf…⁴ insuf…⁵
#>   <chr>                            <chr> <chr>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1 NUTRIENTS                        1289… 5            NA      NA      NA      NA
#> 2 METALS (OTHER THAN MERCURY)      2254  4            NA      NA      NA      NA
#> 3 FLOW ALTERATION(S)               494   1            NA      NA      NA      NA
#> 4 TEMPERATURE                      20459 1            NA      NA      NA      NA
#> 5 AMMONIA                          56.1… 1            NA      NA      NA      NA
#> 6 PH/ACIDITY/CAUSTIC CONDITIONS    56.1… 1            NA      NA      NA      NA
#> 7 SEDIMENT                         3772… 7            NA      NA      NA      NA
#> 8 SALINITY/TOTAL DISSOLVED SOLIDS… 56.1… 1            NA      NA      NA      NA
#> 9 ORGANIC ENRICHMENT/OXYGEN DEPLE… 5269… 5            NA      NA      NA      NA
#> # … with abbreviated variable names ¹​cause_count, ²​meeting_criteria,
#> #   ³​meeting_criteria_count, ⁴​insufficent_information,
#> #   ⁵​insufficient_information_count

The HUC12 service operates similarly but provides data summarized by area, specifically HUC12 units. For example:

x <- huc12_summary("020700100204")
x
#> $huc_summary
#> # A tibble: 1 × 14
#>   huc12  asses…¹ total…² total…³ asses…⁴ asses…⁵ asses…⁶ asses…⁷ asses…⁸ asses…⁹
#>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 02070…      20    46.2    46.2    44.1    95.4    1.77    3.83       0       0
#> # … with 4 more variables: contain_impaired_waters_catchment_area_sq_mi <dbl>,
#> #   contain_impaired_waters_catchment_area_percent <dbl>,
#> #   contain_restoration_catchment_area_sq_mi <dbl>,
#> #   contain_restoration_catchment_area_percent <dbl>, and abbreviated variable
#> #   names ¹​assessment_unit_count, ²​total_catchment_area_sq_mi,
#> #   ³​total_huc_area_sq_mi, ⁴​assessed_catchment_area_sq_mi,
#> #   ⁵​assessed_catchment_area_percent, ⁶​assessed_good_catchment_area_sq_mi, …
#> 
#> $au_summary
#> # A tibble: 20 × 1
#>    assessment_unit_id                      
#>    <chr>                                   
#>  1 MD-ANATF-02140205                       
#>  2 MD-02140205-Northwest_Branch            
#>  3 MD-02140205                             
#>  4 DCTFD01R_00                             
#>  5 MD-ANATF                                
#>  6 DCTFS01R_00                             
#>  7 DCTNA01R_00                             
#>  8 DCTTX27R_00                             
#>  9 DCTFC01R_00                             
#> 10 MD-02140205-Mainstem                    
#> 11 MD-02140205-Mainstem2                   
#> 12 MD-02140205-Northeast_Northwest_Branches
#> 13 DCTWB00R_02                             
#> 14 DCTWB00R_01                             
#> 15 DCANA00E_02                             
#> 16 DCTHR01R_00                             
#> 17 DCTPB01R_00                             
#> 18 DCTDU01R_00                             
#> 19 DCANA00E_01                             
#> 20 DCAKL00L_00                             
#> 
#> $ir_summary
#> # A tibble: 3 × 4
#>   epa_ir_category_name catchment_size_sq_mi catchment_size_percent assessment_…¹
#>   <chr>                               <dbl>                  <dbl>         <dbl>
#> 1 1                                    1.77                   3.83             2
#> 2 4A                                  25.3                   54.8             11
#> 3 5                                   37.9                   81.9              7
#> # … with abbreviated variable name ¹​assessment_unit_count
#> 
#> $use_summary
#> # A tibble: 6 × 5
#>   use_group_name      use_attainment           catchment_size_…¹ catch…² asses…³
#>   <chr>               <chr>                                <dbl>   <dbl>   <dbl>
#> 1 ECOLOGICAL_USE      Not Supporting                       19.5    42.1       15
#> 2 FISHCONSUMPTION_USE Fully Supporting                      1.77    3.83       2
#> 3 FISHCONSUMPTION_USE Insufficient Information              1.91    4.14       1
#> 4 FISHCONSUMPTION_USE Not Supporting                       22.8    49.3       16
#> 5 OTHER_USE           Fully Supporting                      1.91    4.13       3
#> 6 RECREATION_USE      Not Supporting                       24.5    53.0       15
#> # … with abbreviated variable names ¹​catchment_size_sq_mi,
#> #   ²​catchment_size_percent, ³​assessment_unit_count
#> 
#> $param_summary
#> # A tibble: 17 × 4
#>    parameter_group_name                               catchmen…¹ catch…² asses…³
#>    <chr>                                                   <dbl>   <dbl>   <dbl>
#>  1 ALGAL GROWTH                                            22.8    49.3        2
#>  2 CHLORINE                                                10.7    23.2        1
#>  3 HABITAT ALTERATIONS                                     25.3    54.7        3
#>  4 HYDROLOGIC ALTERATION                                   36.5    79.0        6
#>  5 METALS (OTHER THAN MERCURY)                             22.8    49.3        9
#>  6 NUTRIENTS                                               42.4    91.7        4
#>  7 OIL AND GREASE                                          22.8    49.3        3
#>  8 ORGANIC ENRICHMENT/OXYGEN DEPLETION                     42.4    91.7        8
#>  9 PATHOGENS                                               44.1    95.4       15
#> 10 PESTICIDES                                              26.4    57.1       11
#> 11 PH/ACIDITY/CAUSTIC CONDITIONS                            1.72    3.71       1
#> 12 POLYCHLORINATED BIPHENYLS (PCBS)                        26.4    57.1       12
#> 13 SALINITY/TOTAL DISSOLVED SOLIDS/CHLORIDES/SULFATES      19.5    42.1        1
#> 14 SEDIMENT                                                 3.88    8.39       1
#> 15 TOXIC ORGANICS                                          22.8    49.3        8
#> 16 TRASH                                                   42.4    91.7        4
#> 17 TURBIDITY                                               44.1    95.4       15
#> # … with abbreviated variable names ¹​catchment_size_sq_mi,
#> #   ²​catchment_size_percent, ³​assessment_unit_count
#> 
#> $res_plan_summary
#> # A tibble: 1 × 4
#>   summary_type_name catchment_size_sq_mi catchment_size_percent assessment_uni…¹
#>   <chr>                            <dbl>                  <dbl>            <dbl>
#> 1 TMDL                              26.4                   57.1               15
#> # … with abbreviated variable name ¹​assessment_unit_count
#> 
#> $vision_plan_summary
#> # A tibble: 1 × 4
#>   summary_type_name catchment_size_sq_mi catchment_size_percent assessment_uni…¹
#>   <chr>                            <dbl>                  <dbl>            <dbl>
#> 1 TMDL                              26.4                   57.1               15
#> # … with abbreviated variable name ¹​assessment_unit_count

huc12_summary() returns a list of tibbles with different summaries of information. Using the above example, x$huc_summary provides a summary of HUC area, and the area and percentage of catchment assessed as good, unknown, or impaired. x$assessment_unit_id provides a tibble with the unique identifiers for the assessment units (or distinct sections of waterbodies) within the queried HUC12. x$ir_summary provides a simple summary of the area of the catchment classified under different Integrated Report Categories. x$use_summary provides a summary of use attainment with the HUC12. x$param_summary provides the same information for parameter groups. x$res_plan_summary and x$vision_plan_summary provides a summary of the amount of the watershed covered by particular types of restoration plans or vision plan, such as TMDLs.

Domains

Each function has a number of allowable arguments and associated values. In order to explore what values you might be interested in querying, the Domain Value service provides information about allowable options. This is mapped to the domain_values() function. When used without any arguments you get a full list of possible “domains.” These are typically searchable parameters used in all the functions in rATTAINS. Note that the domain names returned by these service are not a one to one match with the argument names used in rATTAINS. It is typically fairly easy to figure out which ones match up to which arguments.

For example if I want to find out the possible organization identifiers to query by:

x <- domain_values(domain_name = "OrgStateCode")
x
#> # A tibble: 146 × 4
#>    domain       name  code  context        
#>    <chr>        <chr> <chr> <chr>          
#>  1 OrgStateCode AK    AK    EPA            
#>  2 OrgStateCode FL    FL    21FL303D       
#>  3 OrgStateCode PA    PA    EPA            
#>  4 OrgStateCode CC    CC    TEST_ORG_C     
#>  5 OrgStateCode AZ    AZ    TEST_TRIBE_B   
#>  6 OrgStateCode MS    MS    21MSWQ         
#>  7 OrgStateCode CT    CT    CT_DEP01       
#>  8 OrgStateCode ND    ND    21NDHDWQ       
#>  9 OrgStateCode MN    MN    REDLAKE        
#> 10 OrgStateCode NM    NM    PUEBLO_POJOAQUE
#> # … with 136 more rows

The function returns a variable with the state codes and the possible parameter values as the context variable. Similarly if I want to look up possible Use Names that are utilized by the Texas Commission on Environmental Quality:

x <- domain_values(domain_name = "UseName", context = "TCEQMAIN")
x
#> # A tibble: 17 × 4
#>    domain  name                                        code              context
#>    <chr>   <chr>                                       <chr>             <chr>  
#>  1 UseName Recreation Use                              Recreation Use    TCEQMA…
#>  2 UseName Fish Consumption Use                        Fish Consumption… TCEQMA…
#>  3 UseName INTERMEDIATE AQUATIC LIFE                   INTERMEDIATE AQU… TCEQMA…
#>  4 UseName OVERALL USE SUPPORT                         OVERALL USE SUPP… TCEQMA…
#>  5 UseName Aquatic Life Use                            Aquatic Life Use  TCEQMA…
#>  6 UseName Oyster Waters Use                           Oyster Waters Use TCEQMA…
#>  7 UseName FISH CONSUMPTION                            FISH CONSUMPTION  TCEQMA…
#>  8 UseName OYSTER AQUATIC LIFE                         OYSTER AQUATIC L… TCEQMA…
#>  9 UseName NON-CONTACT RECREATION                      NON-CONTACT RECR… TCEQMA…
#> 10 UseName CONTACT RECREATION USE                      CONTACT RECREATI… TCEQMA…
#> 11 UseName DOMESTIC WATER SUPPLY - PUBLIC WATER SUPPLY DOMESTIC WATER S… TCEQMA…
#> 12 UseName Public Water Supply Use                     Public Water Sup… TCEQMA…
#> 13 UseName General Use                                 General Use       TCEQMA…
#> 14 UseName PRIMARY RECREATION/SWIMMING                 PRIMARY RECREATI… TCEQMA…
#> 15 UseName CONTACT RECREATION                          CONTACT RECREATI… TCEQMA…
#> 16 UseName NONCONTACT RECREATION USE                   NONCONTACT RECRE… TCEQMA…
#> 17 UseName Recreational Beaches                        Recreational Bea… TCEQMA…

Other Services

JSON Files

By default, all the functions rATTAINS return one or more “tidy” dataframes. These dataframe are created by attempting to flatten the nested JSON data returned by the webservice. This does require some opinionated decisions on what constitutes flat data, and at what variable data should be flattened to. We recognize that the dataframe output might not meet user needs. There if you would prefer to parse the JSON data yourself, use the tidy=FALSE argument to return an unparsed JSON string. A number of R packages are available to parse and flatten JSON data to prepare it for analysis.

Notes

The U.S. EPA is the data provider for this public information. rATTAINS and the author are not affiliated with the EPA. Questions about the package functionality should be directed to the package author. Questions about the webservice or underlying data should be directed to the U.S. EPA. Please do not abuse the webservice using this package.