CADStat: Statistical Tools for Causal Analysis
Loading and merging data
User data can be loaded by selecting File -> Load Data.
Selecting this option brings up a dialog box that allows the user to navigate to and select the file to be loaded.
Data must be in a simple table form with delimited fields and with the first row containing the column names. The default delimiter is a tab (\t), but other possibilities (comma, semicolon, etc.) can be selected. The file is saved in CADStat as the name specified in the read.table(...) -> box.
The dialog box for loading data can also be accessed by selecting the Browse option from within any of the analytical tools.
To load example data provided with CADStat, launch the Data Merge dialog (Packages & Data -> Data Merge) and navigate to CADStat's R directory, for example: C:/Program Files/R/R-3.3.0/library/CADStat/extdata.
Loading delimited text files may prompt you to confirm the Record Seperator and Quote symbols:
If you would like to try the merging data example (see below), load datasets envdata.or.txt and refids.or.txt.
This module is a utility for combining two datasets into one by matching the two datasets based on 1 or more unique sample identifiers.
The data merge module is initiated by choosing Tools -> Data Merge from the menus.
The following dialog box should appear:
Two data sets must be selected. Select each from the pull down menu next to Dataset 1 or Dataset 2, if the data has already been loaded, or click the Browse button to select a tab-delimited text file for reading.
Variable selection drop-down menus are used to specify variables on which the merge will be performed. If no variables are selected, data sets will be merged based on columns that have the same variable name. However, if the two datasets have different names for the same sample identifier, you can select the variable name from dataset 1 that corresponds to the variable name in dataset 2.
If the By All check box button is selected for dataset 1, then any row in dataset 1 that does not have a corresponding row in dataset 2 will still be included, with NA (missing value) used for all columns of dataset 2.
Likewise, if the By All button is selected for dataset 2, then any row in dataset 2 that does not have a corresponding row in dataset 1 will still be included, with NA (missing value) used for all columns of dataset 1.
The Suffixes are only used if variable selection is used to match dataset and there is a column not used for matching in both dataset 1 and 2 with the same name. The suffixes would then be appended to the column name to distinguish whether the variable came from dataset 1 or 2.
Selecting the Sort Output Data check box will sort the rows of the merged dataset according to the columns on which the datasets were matched. If this option is not checked, then the row order of dataset 1 is preserved, with matched rows coming first, followed by unmatched rows of dataset 1, followed by unmatched rows of dataset 2.
The merged data set will be saved in CADStat as the name specified in Local Name. The merged data set can also be written out as a tab-delimited text file if the Save to File button is selected.
The following example merges environmental data (envdata) and reference site identifiers (refids). Load both of these files from the CADStat data directory (see above). Then select Tools -> Data Merge.
The sample identifier for envdata is STRM.ID, while the identifier for refids is SITE.ID. Specify each of these in the drop down boxes. By selecting By All under envdata, we ensure that all of the original data records are carried forward into the merged data set. Records which do not match a record in refids are marked with an NA in the fields contained in refids.
The resulting data set is saved as mergedData, which will be used in subsequent examples in these help pages.