As of version 1.1.0, RaMS
also has functions that allow
irrelevant data to be removed from the file to reduce file sizes. Like
grabMSdata
, there’s one wrapper function
minifyMSdata
that accepts mzML or mzXML files, plus a
vector of m/z values that should either be kept
(mz_include
) or removed (mz_exclude
). The
function then opens up the provided MS files and removes data points in
the MS1 and MS2 spectra that fall outside the
accepted bounds. mz_include
is useful when only a few
masses are of interest, as in targeted metabolomics.
mz_exclude
is useful when many masses are known to be
contaminants or interfere with peakpicking/plotting abilities. This
minification can shrink a file over three orders of magnitude,
decreasing both processing time and memory allocation later in the
pipeline.
This is also very useful for creating demo MS files -
RaMS
uses these functions to produce the sample data in
extdata
, with 6 MS files taking up less than 5 megabytes of
disk space. Many other programs provide the ability to shrink files, but
none (known to me) shrink files by excluding m/z values and
instead can only remove certain retention times.
Below, we begin with a large MS file containing both MS1 and MS2 data and extract only the data corresponding to valine/glycine betaine and homarine.
library(RaMS)
<- list.files(
msdata_files system.file("extdata", package = "RaMS"), full.names = TRUE, pattern = "mzML"
1:4]
)[
<- msdata_files[1]
initial_filename <- gsub(x=paste0("minified_", basename(initial_filename)), "\\.gz", "")
output_filename
<- c(118.0865, 138.0555)
masses_of_interest minifyMSdata(files = initial_filename, output_files = output_filename,
mz_include = masses_of_interest, ppm = 10, warn = FALSE)
Then, when we open the file up (with RaMS
or other
software) we are left with the data corresponding only to those
compounds:
<- grabMSdata(initial_filename)
init_msdata <- grabMSdata(output_filename) msdata
::kable(head(msdata$MS1, 3)) knitr
rt | mz | int | filename |
---|---|---|---|
4.00085 | 118.0865 | 15968431.0 | minified_DDApos_2.mzML |
4.00085 | 138.0550 | 174591.6 | minified_DDApos_2.mzML |
4.00085 | 138.0550 | 174591.6 | minified_DDApos_2.mzML |
::kable(head(msdata$MS2, 3)) knitr
rt | premz | fragmz | int | voltage | filename |
---|---|---|---|---|---|
4.182333 | 118.0864 | 51.81098 | 3809.649 | 35 | minified_DDApos_2.mzML |
4.182333 | 118.0864 | 58.06422 | 10133.438 | 35 | minified_DDApos_2.mzML |
4.182333 | 118.0864 | 58.06590 | 390179.500 | 35 | minified_DDApos_2.mzML |
Both the TIC and BPC are updated to reflect the smaller file size as well:
par(mfrow=c(2, 1), mar=c(2.1, 2.1, 1.1, 0.1))
plot(init_msdata$BPC$rt, init_msdata$BPC$int, type = "l", main = "Initial BPC")
plot(msdata$BPC$rt, msdata$BPC$int, type = "l", main = "New BPC")
The minifyMSdata
function is vectorized so the exact
same syntax can be used for multiple files:
dir.create("mini_mzMLs/")
<- paste0("mini_mzMLs/", basename(msdata_files))
output_files <- gsub(x=output_files, "\\.gz", "")
output_files
minifyMSdata(files = msdata_files, output_files = output_files, verbosity = 0,
mz_include = masses_of_interest, ppm = 10, warn = FALSE)
<- grabMSdata(output_files, verbosity = 0)
mini_msdata
library(ggplot2)
ggplot(mini_msdata$BPC) + geom_line(aes(x=rt, y=int, color=filename)) + theme_bw()
These new files are valid according to the validator provided in MSnbase, which means that most programs should be able to open them, but this feature is still experimental and may break on quirky data. If that happens, please feel free to submit a bug report at
As an example of how I use this minification function, here’s the
code used to create the minified files in the \extdata
folder that ships with the package. This was especially useful because
the package can’t be more than 5MB but it’s incredibly useful to include
some standalone MS data for demos and vignettes like this one. I don’t
actually run this code in the vignette itself to save compilation time
but it will run if you test it yourself.
These files originate from the Ingalls Lab at the University of Washington, USA and are published in the manuscript “Metabolic consequences of cobalamin scarcity in diatoms as revealed through metabolomics”. Files are downloaded from the corresponding Metabolights repository.
First, we identify the m/z values we’d like to keep in the minified files. For the demo data, I’ll use the Ingalls Lab list of targeted compounds - those we have authentic standards for.
<- read.csv(paste0("https://raw.githubusercontent.com/",
raw_stans "IngallsLabUW/Ingalls_Standards/",
"b098927ea0089b6e7a31e1758e7c7eaad5408535/",
"Ingalls_Lab_Standards_NEW.csv"))
<- as.numeric(unique(raw_stans[raw_stans$Fraction1=="HILICPos",]$m.z))
mzs_to_include # Include glycine betaine isotopes for README demo
<- c(mzs_to_include, 119.0899, 119.0835) mzs_to_include
Then, we download the raw MS data from the online repository into which it’s been deposited.
if(!dir.exists("vignettes/data"))dir.create("vignettes/data")
<- "ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS703/"
base_url <- paste0(base_url, "170223_Smp_LB12HL_", c("AB", "CD", "EF"), "_pos.mzXML")
chosen_files <- gsub(x=basename(chosen_files), "170223_Smp_", "")
new_names
mapply(download.file, chosen_files, paste0("vignettes/data/", new_names),
mode = "wb", method = "libcurl")
The MSMS data wasn’t uploaded, so we handle that separately by pulling it off the lab computer manually and copying it over to our temporary directory. If you’re following along, you can skip this chunk or use your own DDA data.
file.copy(from = paste0("Z:/1_QEdata/2016/2016_Katherine_1335_LightB12_",
"Experiment/170223_KRH_Rerun_1335_LightB12_Exp_HILIC/",
"positive/",
"170223_Poo_AllCyanoAqExtracts_DDApos_2.mzXML"),
to = "vignettes/data/DDApos_2.mzXML", overwrite = TRUE)
Then we can actually perform the minification:
library(RaMS)
if(!dir.exists("inst/extdata"))dir.create("inst/extdata", recursive = TRUE)
<- list.files("vignettes/data/", full.names = TRUE)
init_files <- paste0("inst/extdata/", basename(init_files))
out_files minifyMSdata(files = init_files, output_files = out_files, warn = FALSE,
mz_include = mzs_to_include, ppm = 20)
Now we have four minified mzXML files in our inst/extdata folder.
However, we’d like to be able to demo the mzML functionality as well as
that of mzXMLs, so we can use Proteowizard’s
msconvert
tool because RaMS
can’t convert
between mzML and mzXML or vice versa. You’ll need to install
msconvert
and add it to your path for this step.
We also use msconvert
to trim the files by retention
time, keeping data between 4 and 15 minutes.
Finally, we gzip the files to get them as small as possible, also
using msconvert
.
system("msconvert inst/extdata/*.mzXML -o inst/extdata/temp --noindex")
system("msconvert --mzXML inst/extdata/*.mzXML -o inst/extdata/temp --noindex")
system('msconvert inst/extdata/temp/*.mzML --filter \"scanTime [240,900]\" -o inst/extdata -g')
system('msconvert inst/extdata/temp/*.mzXML --mzXML --filter \"scanTime [240,900]\" -o inst/extdata -g')
And then for the last few steps, we again rename the files (since
msconvert
expands them to their full .raw names) and remove
the ones we don’t need for the demos.
<- list.files("inst/extdata", full.names = TRUE)
init_files <- paste0("inst/extdata/", gsub(x=init_files, ".*(Smp_|Extracts_)", ""))
new_names file.rename(init_files, new_names)
unlink("inst/extdata/temp", recursive = TRUE)
file.remove(list.files("inst/extdata", pattern = "mzXML$", full.names = TRUE))
file.remove(paste0("inst/extdata/", c("LB12HL_CD.mzXML.gz", "LB12HL_EF.mzXML.gz")))
To check that the new files look ok, we can see if we can read them
with RaMS
and MSnbase
.
::readMSData(list.files("inst/extdata", full.names = TRUE)[1], msLevel. = 1)
MSnbase::grabMSdata(new_names[1]) RaMS
Finally, remember to clean up the original downloads folder
unlink("vignettes/data", recursive = TRUE)
README last built on 2022-12-14