Explicitly create tokenizer objects. Usually you will not call these
function, but will instead use one of the use friendly wrappers like
read_csv()
.
Usage
tokenizer_delim(
delim,
quote = "\"",
na = "NA",
quoted_na = TRUE,
comment = "",
trim_ws = TRUE,
escape_double = TRUE,
escape_backslash = FALSE,
skip_empty_rows = TRUE
)
tokenizer_csv(
na = "NA",
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_tsv(
na = "NA",
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_line(na = character(), skip_empty_rows = TRUE)
tokenizer_log(trim_ws)
tokenizer_fwf(
begin,
end,
na = "NA",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_ws(na = "NA", comment = "", skip_empty_rows = TRUE)
Arguments
- delim
Single character used to separate fields within a record.
- quote
Single character used to quote strings.
- na
Character vector of strings to interpret as missing values. Set this option to
character()
to indicate no missing values.- quoted_na
Should missing values inside quotes be treated as missing values (the default) or strings. This parameter is soft deprecated as of readr 2.0.0.
- comment
A string used to identify comments. Any text after the comment characters will be silently ignored.
- trim_ws
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it?
- escape_double
Does the file escape quotes by doubling them? i.e. If this option is
TRUE
, the value""""
represents a single quote,\"
.- escape_backslash
Does the file use backslashes to escape special characters? This is more general than
escape_double
as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like\\n
.- skip_empty_rows
Should blank rows be ignored altogether? i.e. If this option is
TRUE
then blank rows will not be represented at all. If it isFALSE
then they will be represented byNA
values in all the columns.- begin, end
Begin and end offsets for each file. These are C++ offsets so the first column is column zero, and the ranges are [begin, end) (i.e inclusive-exclusive).
Examples
tokenizer_csv()
#> $delim
#> [1] ","
#>
#> $quote
#> [1] "\""
#>
#> $na
#> [1] "NA"
#>
#> $quoted_na
#> [1] TRUE
#>
#> $comment
#> [1] ""
#>
#> $trim_ws
#> [1] TRUE
#>
#> $escape_double
#> [1] TRUE
#>
#> $escape_backslash
#> [1] FALSE
#>
#> $skip_empty_rows
#> [1] TRUE
#>
#> attr(,"class")
#> [1] "tokenizer_delim"