str()
now works in a colorized context in the
presence of a column of class integer64
, i.e. parsed with
col_big_integer()
(@bart1, #477).
The embedded implementation of the Grisu algorithm for printing
floating point numbers now uses snprintf()
instead of
sprintf()
and likewise for vroom’s own code (@jeroen, #480).
vroom(col_select=)
now handles column selection by
numeric position when id
column is provided
(#455).
vroom(id = "path", col_select = a:c)
is treated like
vroom(id = "path", col_select = c(path, a:c))
. If an
id
column is provided, it is automatically included in the
output (#416).
vroom_write(append = TRUE)
does not modify an
existing file when appending an empty data frame. In particular, it does
not overwrite (delete) the existing contents of that file
(https://github.com/tidyverse/readr/issues/1408, #451).
vroom::problems()
now defaults to
.Last.value
for its primary input, similar to how
readr::problems()
works (#443).
The warning that indicates the existence of parsing problems has been improved, which should make it easier for the user to follow-up (https://github.com/tidyverse/readr/issues/1322).
vroom()
reads more reliably from filepaths
containing non-ascii characters, in a non-UTF-8 locale (#394,
#438).
vroom_format()
and vroom_write()
only
quote values that contain a delimiter, quote, or newline. Specifically
values that are equal to the na
string (or that start with
it) are no longer quoted (#426).
Fixed segfault when reading in multiple files and the first file has only a header row of column names, but subsequent files have at least one row (#430).
Fixed segfault when vroom_format()
is given an empty
data frame (#425)
Fixed a segfault that could occur when the final field of the final line is missing and the file also does not end in a newline (#429).
Fixed recursive garbage collection error that could occur during
vroom_write()
when output_column()
generates
an ALTREP vector (#389).
vroom_progress()
uses
rlang::is_interactive()
instead of
base::interactive()
.
col_factor(levels = NULL)
honors the na
strings of vroom()
and its own include_na
argument, as described in the docs, and now reproduces the behaviour of
readr’s first edition parser (#396).
Jenny Bryan is now the official maintainer.
Fix uninitialized bool detected by CRAN’s UBSAN check (https://github.com/tidyverse/vroom/pull/386)
Fix buffer overflow when trying to parse an integer field that is over 64 characters long (https://github.com/tidyverse/readr/issues/1326)
Fix subset indexing when indexes span a file boundary multiple times (#383)
vroom(col_select=)
now works if
col_names = FALSE
as intended (#381)
vroom(n_max=)
now correctly handles cases when
reading from a connection and the file does not end with a
newline (https://github.com/tidyverse/readr/issues/1321)
vroom()
no longer issues a spurious warning when the
parsing needs to be restarted due to the presence of embedded newlines
(https://github.com/tidyverse/readr/issues/1313)
Fix performance issue when materializing subsetted vectors (#378)
vroom_format()
now uses the same internal
multi-threaded code as vroom_write()
, improving its
performance in most cases (#377)
vroom_fwf()
no longer omits the last line if it does
not end with a newline
(https://github.com/tidyverse/readr/issues/1293)
Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (https://github.com/tidyverse/readr/issues/1297)
Files with a header but no contents, or a empty file if
col_names = FALSE
no longer cause a hang when
progress = TRUE
(https://github.com/tidyverse/readr/issues/1297)
Commented lines with comments at the end of lines no longer hang R (https://github.com/tidyverse/readr/issues/1309)
Comment lines containing unpaired quotes are no longer treated as unterminated quotations (https://github.com/tidyverse/readr/issues/1307)
Values with only a Inf
or NaN
prefix
but additional data afterwards, like Inform
or no longer
inappropriately guessed as doubles
(https://github.com/tidyverse/readr/issues/1319)
Time types now support %h
format to denote hour
durations greater than 24, like readr
(https://github.com/tidyverse/readr/issues/1312)
Fix performance issue when materializing subsetted vectors (#378)
vroom()
now supports files with only carriage return
newlines (\r
). (#360,
https://github.com/tidyverse/readr/issues/1236)
vroom()
now parses single digit datetimes more
consistently as readr has done
(https://github.com/tidyverse/readr/issues/1276)
vroom()
now parses Inf
values as
doubles (https://github.com/tidyverse/readr/issues/1283)
vroom()
now parses NaN
values as
doubles (https://github.com/tidyverse/readr/issues/1277)
VROOM_CONNECTION_SIZE
is now parsed as a double,
which supports scientific notation (#364)
vroom()
now works around specifying a
\n
as the delimiter (#365,
https://github.com/tidyverse/dplyr/issues/5977)
vroom()
no longer crashes if given a
col_name
and col_type
both less than the
number of columns
(https://github.com/tidyverse/readr/issues/1271)
vroom()
no longer hangs if given an empty value for
locale(grouping_mark=)
(https://github.com/tidyverse/readr/issues/1241)
Fix performance regression when guessing with large numbers of rows (https://github.com/tidyverse/readr/issues/1267)
vroom(col_types=)
now accepts column type names like
those accepted by utils::read.table. e.g. vroom::vroom(col_types =
list(a = “integer”, b = “double”, c = “skip”))
vroom()
now respects the quote
parameter properly in the first two lines of the file
(https://github.com/tidyverse/readr/issues/1262)
vroom_write()
now always correctly writes its output
including column names in UTF-8
(https://github.com/tidyverse/readr/issues/1242)
vroom_write()
now creates an empty file when given a
input without any columns
(https://github.com/tidyverse/readr/issues/1234)
vroom(col_types=)
now truncates the column types if
the user passes too many types. (#355)
vroom()
now always includes the last row when
guessing (#352)
vroom(trim_ws = TRUE)
now trims field content within
quotes as well as without (#354). Previously vroom explicitly retained
field content inside quotes regardless of the value of
trim_ws
.
vroom()
now supports inputs with unnamed column
types that are less than the number of columns (#296)
vroom()
now outputs the correct column names even in
the presence of skipped columns (#293, tidyverse/readr#1215)
vroom_fwf(n_max=)
now works as intended when the
input is a connection.
vroom()
and vroom_write()
now
automatically detect the compression format regardless of the file
extension for bzip2, xzip, gzip and zip files (#348)
vroom()
and vroom_write()
now
automatically support many more archive formats thanks to the archive
package. These include new support for writing zip files, reading and
writing 7zip, tar and ISO files.
vroom(num_threads = 1)
will now not spawn any
threads. This can be used on as a workaround on systems without full
thread support.
Threads are now automatically disabled on non-macOS systems compiling against clang’s libc++. Most systems non-macOS systems use the more common gcc libstdc++, so this should not effect most users.
Parsers now treat NA values as NA even if they are valid values for the types (#342)
Element-wise indexing into lazy (ALTREP) vectors now has much less overhead (#344)
New vroom(show_col_types=)
argument to more simply
control when column types are shown.
vroom()
, vroom_fwf()
and
vroom_lines()
now support multi-byte encodings such as
UTF-16 and UTF-32 by converting these files to UTF-8 under the hood
(#138)
vroom()
now supports skipping comments and blank
lines within data, not just at the start of the file (#294,
#302)
vroom()
now uses the tzdb package when parsing
date-times (@DavisVaughan, #273)
vroom()
now emits a warning of class
vroom_parse_issue
if there are non-fatal parsing
issues.
vroom()
now emits a warning of class
vroom_mismatched_column_name
if the user supplies a column
type that does not match the name of a read column (#317).
The vroom package now uses the MIT license, as part of systematic relicensing throughout the r-lib and tidyverse packages (#323)
`vroom() correctly reads double values with comma as decimal separator (@kent37 #313)
vroom()
now correctly skips lines with only one
quote if the format doesn’t use quoting
(https://github.com/tidyverse/readr/issues/991#issuecomment-616378446)
vroom()
and vroom_lines()
now handle
files with mixed windows and POSIX line endings
(https://github.com/tidyverse/readr/issues/1210)
vroom()
now outputs a tibble with the expected
number of columns and types based on col_types
and
col_names
even if the file is empty (#297).
vroom()
no longer mis-indexes files read from
connections with windows line endings when the two line endings falls on
separate sides of the read buffer (#331)
vroom()
no longer crashes if n_max = 0
and col_names
is a character (#316)
vroom()
now preserves the spec attribute when vroom
and readr are both loaded (#303)
vroom()
now allows specifying column names in
col_types
that have been repaired (#311)
vroom()
no longer inadvertently calls
.name_repair
functions twice (#310).
vroom()
is now more robust to quoting issues when
tracking the CSV state (#301)
vroom()
now registers the S3 class with
methods::setOldClass()
(r-dbi/DBI#345)
col_datetime()
now supports ‘%s’ format, which
represents decimal seconds since the Unix epoch.
col_numeric()
now supports
grouping_mark
and decimal_mark
that are
unicode characters, such as U+00A0 which is commonly used as the
grouping mark for numbers in France
(https://github.com/tidyverse/readr/issues/796).
vroom_fwf()
gains a skip_empty_rows
argument to skip empty lines
(https://github.com/tidyverse/readr/issues/1211)
vroom_fwf()
now respects n_max
, as
intended (#334)
vroom_lines()
gains a na
argument.
vroom_write_lines()
no longer escapes or quotes
lines.
vroom_write_lines()
now works as intended
(#291).
vroom_write(path=)
has been deprecated, in favor of
file
, to match readr.
vroom_write_lines()
now exposes the
num_threads
argument.
problems()
now prints the correct row number of
parse errors (#326)
problems()
now throws a more informative error if
called on a readr object (#308).
problems()
now de-duplicates identical problems
(#318)
Fix an inadvertent performance regression when reading values (#309)
n_max
argument is correctly respected in edge cases
(#306)
factors with implicit levels now work when fields are quoted, as intended (#330)
Guessing double types no longer unconditionally ignores leading
whitespace. Now whitespace is only ignored when trim_ws
is
set.
vroom now tracks indexing and parsing errors like readr. The
first time an issue is encountered a warning will be signaled. A tibble
of all found problems can be retrieved with
vroom::problems()
. (#247)
Data with newlines within quoted fields will now automatically revert to using a single thread and be properly read (#282)
NUL values in character data are now permitted, with a warning.
New vroom_write_lines()
function to write a
character vector to a file (#291)
vroom_write()
gains a eol=
parameter to
specify the end of line character(s) to use. Use
vroom_write(eol = "\r\n")
to write a file with Windows
style newlines (#263).
Datetime formats used when guessing now match those used when parsing (#240)
Quotes are now only valid next to newlines or delimiters (#224)
vroom()
now signals an R error for invalid date and
datetime formats, instead of crashing the session (#220).
vroom(comment = )
now accepts multi-character
comments (#286)
vroom_lines()
now works with empty files
(#285)
Vectors are now subset properly when given invalid subscripts (#283)
vroom_write()
now works when the delimiter is empty,
e.g. delim = ""
(#287).
vroom_write()
now works with all ALTREP vectors,
including string vectors (#270)
An internal call to new.env()
now correctly uses the
parent
argument (#281)
Test failures on R 4.1 related to factors with NA values fixed (#262)
vroom()
now works without error with readr versions
of col specs (#256, #264, #266)
Test failures on R 4.1 related to POSIXct classes fixed (#260)
Column subsetting with double indexes now works again (#257)
vroom(n_max=)
now only partially downloads files
from connections, as intended (#259)
The Rcpp dependency has been removed in favor of cpp11.
vroom()
now handles cases when id
is
set and a column in skipped (#237)
vroom()
now supports column selections when there
are some empty column names (#238)
vroom()
argument n_max
now works
properly for files with windows newlines and no final newline
(#244)
Subsetting vectors now works with View()
in RStudio
if there are now rows to subset (#253).
Subsetting datetime columns now works with NA
indices (#236).
vroom()
now writes the column names if given an
input with no rows (#213)
vroom()
columns now support indexing with NA values
(#201)
vroom()
no longer truncates the last value in a file
if the file contains windows newlines but no final newline
(#219).
vroom()
now works when the na
argument
is encoded in non ASCII or UTF-8 locales and the file encoding
is not the same as the native encoding (#233).
vroom_fwf()
now verifies that the positions are
valid, namely that the begin value is always less than the previous end
(#217).
vroom_lines()
gains a locale
argument
so you can control the encoding of the file (#218)
vroom_write()
now supports the append
argument with R connections (#232)
vroom_altrep_opts()
and the argument
vroom(altrep_opts =)
have been renamed to
vroom_altrep()
and altrep
respectively. The
prior names have been deprecated.vroom()
now supports reading Big Integer values with
the bit64
package. Use col_big_integer()
or
the “I” shortcut to read a column as big integers. (#198)
cols()
gains a .delim
argument and
vroom()
now uses it as the delimiter if it is provided
(#192)
vroom()
now supports reading from
stdin()
directly, interpreted as the C-level standard input
(#106).
col_date
now parses single digit month and day
(@edzer, #123,
#170)
fwf_empty()
now uses the skip
parameter, as intended.
vroom()
can now read single line files without a
terminal newline (#173).
vroom()
can now select the id column if provided
(#110).
vroom()
now correctly copies string data for factor
levels (#184)
vroom()
no longer crashes when files have trailing
fields, windows newlines and the file is not newline or null
terminated.
vroom()
now includes a spec object with the
col_types
class, as intended.
vroom()
now better handles floating point values
with very large exponents (#164).
vroom()
now uses better heuristics to guess the
delimiter and now throws an error if a delimiter cannot be guessed
(#126, #141, #167).
vroom()
now has an improved error message when a
file does not exist (#169).
vroom()
no longer leaks file handles (#177,
#180)
vroom()
now outputs its messages on
stdout()
rather than stderr()
, which avoids
the text being red in RStudio and in the Windows GUI.
vroom()
no longer overflows when reading files with
more than 2B entries (@wlattner, #183).
vroom_fwf()
is now more robust if not all lines are
the expected length (#78)
vroom_fwf()
and fwf_empty()
now support
passing Inf
to guess_max()
.
vroom_str()
now works with S4 objects.
vroom_fwf()
now handles files with dos newlines
properly.
vroom_write()
now does not try to write anything
when given empty inputs (#172).
Dates, times, and datetimes now properly consider the locale when parsing.
Added benchmarks with wide data for both numeric and character data (#87, @R3myG)
The delimiter used for parsing is now shown in the message output (#95 @R3myG)
id
is now stored as an run length
encoded Altrep vector, which uses less memory and is much faster for
large inputs. (#111)vroom_lines()
now properly respects the
n_max
parameter (#142)
vroom()
and vroom_lines()
now support
reading files which do not end in newlines by using a file connection
(#40).
vroom_write()
now works with the standard output
connection stdout()
(#106).
vroom_write()
no longer crashes
non-deterministically when used on Altrep vectors.
The integer parser now returns NA values for invalid inputs (#135)
Fix additional UBSAN issue in the mio project reported by CRAN (#97)
Fix indexing into connections with quoted fields (#119)
Move example files for vroom()
out of
\dontshow{}
.
Fix integer overflow with very large files (#116, #119)
Fix missing columns and windows newlines (#114)
Fix encoding of column names (#113, #115)
Throw an error message when writing a zip file, which is not supported (@metaOO, #145)
Default message output from vroom()
now uses
Rows
and Cols
(@meta00, #140)
vroom_lines()
function added, to (lazily) read lines
from a file into a character vector (#90).Fix for a hang on Windows caused by a race condition in the progress bar (#98)
Remove accidental runtime dependency on testthat (#104)
Fix to actually return non-Altrep character columns on R 3.2, 3.3 and 3.4.
Disable colors in the progress bar when running in RStudio, to work around an issue where the progress bar would be garbled (https://github.com/rstudio/rstudio/issues/4777)
Fix for UBSAN issues reported by CRAN (#97)
Fix for rchk issues reported by CRAN (#94)
The progress bar now only updates every 10 milliseconds.
Getting started vignette index entry now more informative (#92)
Initial release
Added a NEWS.md
file to track changes to the
package.