Adding or modifying checks
This vignette describes how to modify or add new checks to the existing
suite of checks implemented by pkgcheck. Each of the internal checks
is defined in a separate file in the R directory of this package with
the prefix of check_ (or checks_ for files which define multiple,
related checks). Each check requires two main functions:
One defining the check itself, which must have a prefix
pkgchk_, followed by the name of the check; andOne defining
summaryandprintmethods based on the result of the first function, which must have a prefixoutput_pkgchk_.
The structure of these two function are described in the following two sections.
Both of these functions must accept a single input parameter of a
pkgcheck object, by convention named checks. This object is a list
of four main items:
pkgwhich summarises data extracted frompkgstats::pkgstats(), and includes essential information on the package being checked.infowhich contains information used in checks, includinginfo$gitdetailing git repository information,info$pkgstatscontaining a summary of a few statistics generated frompkgstats::pkgstats(), along with statistical comparisons against distributions from all current CRAN packages, aninfo$network_filespecifying a local directory to avis.jsvisualisation of the function call network of the package, and aninfo$badgesitem containing information from GitHub workflows and associated badges, where available.checkswhich contains a list of all objects returned from allpkgchk_...()functions, which are used as input tooutput_pkgchk_...()functions.metacontaining a named character vector of versions of the core packages used inpkgcheck.
pkgcheck objects generally also include a fifth item, goodpractice,
containing the results of goodpractice
checks. The checks item
passed to each pkgchk_...() function contains all information on the
package, info, meta, and (optionally) goodpractice items. Checks
may use any of this information, or even add additional information as
demonstrated below. The checks$checks list represents the output of
check functions, and may not be used in any way within pkgchk_...()
functions.
Click here to see structure of full pkgcheck object
pkgcheck object
This is the output of applying pkgcheck to a package generated with
the srr function
srr_stats_pkg_skeleton(),
with goodpractice = FALSE to suppress that part of the results.
#> List of 4
#> $ pkg :List of 8
#> ..$ name : chr "dummypkg"
#> ..$ path : chr "/tmp/RtmpkguwJc/dummypkg"
#> ..$ version : chr "0.0.0.9000"
#> ..$ url : chr(0)
#> ..$ BugReports : chr(0)
#> ..$ license : chr "GPL-3"
#> ..$ summary :List of 12
#> .. ..$ num_authors : int 1
#> .. ..$ num_vignettes : int 0
#> .. ..$ num_data : int 0
#> .. ..$ imported_pkgs : int 1
#> .. ..$ num_exported_fns : int 1
#> .. ..$ num_non_exported_fns: int 2
#> .. ..$ num_src_fns : int 2
#> .. ..$ loc_exported_fns : int 3
#> .. ..$ loc_non_exported_fns: int 3
#> .. ..$ loc_src_fns : int 5
#> .. ..$ num_params_per_fn : int 0
#> .. ..$ languages : chr [1:2] "C++: 72%" "R: 28%"
#> ..$ dependencies:'data.frame': 4 obs. of 2 variables:
#> .. ..$ type : chr [1:4] "depends" "imports" "suggests" "linking_to"
#> .. ..$ package: chr [1:4] "NA" "Rcpp" "testthat" "Rcpp"
#> $ info :List of 5
#> ..$ git : list()
#> ..$ srr :List of 5
#> .. ..$ message : chr [1:108] "This package still has TODO standards and can not be submitted" "Package can not be submitted because the following standards [v0.1.0] are missing from your code:" "" "G1.0" ...
#> .. ..$ categories : chr "Regression and Supervised Learning"
#> .. ..$ missing_stds: chr "G1.0, G1.4a, G1.6, G2.0a, G2.1a, G2.2, G2.3a, G2.3b, G2.4, G2.4a, G2.4b, G2.4c, G2.4d, G2.4e, G2.5, G2.6, G2.7,"| __truncated__
#> .. ..$ report_file : chr "/home/smexus/.cache/pkgcheck/static/dummypkg_srr2021-10-15-16:46:34.html"
#> .. ..$ okay : logi FALSE
#> ..$ pkgstats :'data.frame': 25 obs. of 4 variables:
#> .. ..$ measure : chr [1:25] "files_R" "files_src" "files_vignettes" "files_tests" ...
#> .. ..$ value : num [1:25] 4 2 0 2 10 26 6 0 3 1 ...
#> .. ..$ percentile: num [1:25] 23.284 77.356 0 64.15 0.445 ...
#> .. ..$ noteworthy: chr [1:25] "" "" "TRUE" "" ...
#> .. ..- attr(*, "language")= chr [1:2] "C++: 72%" "R: 28%"
#> .. ..- attr(*, "files")= chr [1:2] "C++: 2" "R: 4"
#> ..$ network_file: chr "/home/smexus/.cache/pkgcheck/static/dummypkg_pkgstats.html"
#> ..$ badges : list()
#> $ checks:List of 12
#> ..$ fns_have_exs : Named logi FALSE
#> .. ..- attr(*, "names")= chr "test_fn.Rd"
#> ..$ has_bugs : logi FALSE
#> ..$ has_citation : logi FALSE
#> ..$ has_codemeta : logi FALSE
#> ..$ has_contrib_md : logi FALSE
#> ..$ has_scrap : chr(0)
#> ..$ has_url : logi FALSE
#> ..$ has_vignette : logi FALSE
#> ..$ left_assign :List of 2
#> .. ..$ global: logi FALSE
#> .. ..$ usage : Named num [1:2] 2 0
#> .. .. ..- attr(*, "names")= chr [1:2] "<-" "="
#> ..$ on_cran : logi FALSE
#> ..$ pkgname_available: logi TRUE
#> ..$ uses_roxygen2 : logi TRUE
#> $ meta : Named chr [1:3] "0.0.2.25" "0.0.2.96" "0.0.1.120"
#> ..- attr(*, "names")= chr [1:3] "pkgstats" "pkgcheck" "srr"
#> - attr(*, "class")= chr [1:2] "pkgcheck" "list"
#> NULL
1. The check function
An example is the check for whether a package has a citation, defined
in
R/check_has_citation.R:
#' Check whether a package has a `inst/CITATION` file.
#'
#' "CITATION" files are required for all rOpenSci packages, as documented [in
#' our "*Packaging
#' Guide*](https://devguide.ropensci.org/pkg_building.html#citation-file). This
#' does not check the contents of that file in any way.
#'
#' @param checks A 'pkgcheck' object with full \pkg{pkgstats} summary and
#' \pkg{goodpractice} results.
#' @noRd
pkgchk_has_citation <- function (checks) {
"CITATION" %in% list.files (fs::path (checks$pkg$path, "inst"))
}
This check is particularly simple, because a "CITATION" file must
have exactly that name, and must be in the inst
sub-directory.
This function returns a simple logical of TRUE if the expected
"CITATION" file is present, otherwise it returns FALSE. This
function, and all functions beginning with the prefix pkgchk_, will be
automatically called by the main pkgcheck() function, and the value
stored in checks$checks$has_citation. The name of the item within the
checks$checks list is the name of the function with the pkgchk_
prefix removed.
A more complicated example is the function to check whether a package
contains files which should not be there – internally called “scrap”
files. The check function itself, defined in
R/check-scrap.R,
checks for the presence of files matching an internally-defined list
including files used to locally cache folder thumbnails such as
".DS_Store" or "Thumbs.db". The function returns a character vector
of the names of any “scrap” files which can be used by the print
method to provide details of files which should be removed. This
illustrates the first general principle of these check functions; that,
Any information needed when summarising or printing the check result should be returned from the main check function.
A second important principle is that,
Check functions should never return
NULL, rather should always return an empty vector (such asinteger(0)).
The following section describes the output_pkgchk_... functions which
convert these return values to summary and print output.
2. The output function
All output_pkgchk_...() functions must also accept the single input
parameter of checks, in which the checks$checks sub-list will
already have been populated by calling all pkgchk_...() functions
described in the previous section. The pkgchk_has_citation() function
will create an entry of checks$checks$has_citation which contains the
binary flag indicating whether or not a "CITATION" file is present.
Similarly, the the pkgchk_has_scrap()
function
will create checks$checks$has_scrap which will contain names of any
scrap files present, and a length-zero vector otherwise.
The
pkgchk_functions must not use any data inchecks$checks, as they create this data.The
output_pkgchk_functions must use the data fromchecks$checksto constructsummaryorprintoutput.
The output_pkgchk_has_citation() function looks like this:
output_pkgchk_has_citation <- function (checks) {
out <- list (
check_pass = checks$checks$has_citation,
summary = "",
print = ""
)
# disabled:
# https://github.com/ropensci-review-tools/pkgcheck/issues/115
# out$summary <- paste0 (
# ifelse (out$check_pass, "has", "does not have"),
# " a 'CITATION' file."
# )
return (out)
}
The first lines are common to all output_pkgchk_...() functions, and
define the generic return object. This object must be a list with the
following three items:
check_passas binary flag indicating whether or not a check was passed;summarycontaining text used to generate thesummaryoutput; andprintcontaining information used to generate theprintoutput, itself alistof the following items:A
msg_preto display at the start of theprintresult;An
objectto be printed, such as a vector of values, or adata.frame.A
msg_postto display at the end of theprintresult following theobject.
summary and print methods may be suppressed by assigning values of
"". The above example of pkgcheck_has_citation has print = "", and
so no information from this check will appear as output of the print
method. The summary field is commented-out in the current version, but
left to illustrate here that it has a value that is specified for both
TRUE and FALSE values of check_pass, via an ifelse statement.
The value is determined by the result of the main
pkgchk_has_citation() call, and is converted into a green tick if
TRUE, or a red cross if FALSE.
Checks for which print information is desired require a non-empty
print item, as in the output_pkgchk_has_scrap()
function:
output_pkgchk_has_scrap <- function (checks) {
out <- list (
check_pass = length (checks$checks$has_scrap) == 0L,
summary = "",
print = ""
)
if (!out$check_pass) {
out$summary <- "Package contains unexpected files."
out$print <- list (
msg_pre = paste0 (
"Package contains the ",
"following unexpected files:"
),
obj = checks$checks$has_scrap,
msg_post = character (0)
)
}
return (out)
}
In this case, both summary and print methods are only triggered
if (!out$check_pass) – so only if the check fails. The print method
generates the heading specified in out$print$msg_pre, with any
vector-valued objects stored in the corresponding obj list item
displayed as formatted lists. A package with “scrap” files, "a" and
"b", would thus have out$print$obj <- c ("a", "b"), and when printed
would look like this:
#> ✖ Package contains the following unexpected files:
#> • a
#> • b
This formatting is also translated into corresponding markdown and HTML
formatting in the checks_to_markdown()
function.
The design of these pkgchk_ and output_pkgchk_ functions aims to
make the package readily extensible, and we welcome discussions about
developing new checks. The primary criterion for new package-internal
checks is that they must be of very general applicability, in that they
should check for a condition that almost every package should or
should not meet.
The package also has a mechanism to easily incorporate more specific, locally-defined checks, as explored in the following section.
3. Creating new checks
3.1 New Local Checks (for package users)
The main pkgcheck()
function
has an additional parameter, extra_env which specifies,
Additional environments from which to collate checks. Other package names may be appended using c, as in c(.GlobalEnv, “mypkg”).
This allows specific checks to be defined locally, and run by passing
the name of the environment in which those checks are defined in this
parameter. This section illustrates the process using the bundled
“tarball” (that is, .tar.gz file) of one version of the pkgstats
package included with
that package.
f <- system.file ("extdata", "pkgstats_9.9.tar.gz", package = "pkgstats")
path <- pkgstats::extract_tarball (f)
checks <- pkgcheck (path)
summary (checks)
#>
#> ── pkgstats 9.9 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>
#> ✔ Package name is available
#> ✖ does not have a 'contributing' file.
#> ✔ uses 'roxygen2'.
#> ✔ 'DESCRIPTION' has a URL field.
#> ✔ 'DESCRIPTION' has a BugReports field.
#> ✖ Package has no HTML vignettes
#> ✖ These functions do not have examples: [pkgstats_from_archive].
#> ✔ Package has continuous integration checks.
#> ✖ Package coverage failed
#> ✖ R CMD check found 1 error.
#> ✔ R CMD check found no warnings.
#>
#> ℹ Current status:
#> ✖ This package is not ready to be submitted.
Let’s now presume I have a reputation in the R community for all of my
packages starting with “aa”, to ensure they are always listed first.
This section demonstrates how to implement a check that only passes if
the first two letters of the package name are “aa”. The first step
described above is to define the check itself via a function prefixed
with pkgchk_. The easiest approach would be for the pkgcheck_
function to directly check the name, and return a logical flag
indicating whether or not the same starts with “aa”. The resultant
summary and print methods can, however, only use the information
provided by the initial pkgchk_ function. That means if we want to
print the actual name in the result of either of those functions, to
show that it indeed does not form the desired patter, we need to return
that information. The check function is then simply:
pkgchk_starts_with_aa <- function (checks) {
checks$pkg$name
}
We then need to define the output functions:
output_pkgchk_starts_with_aa <- function (checks) {
out <- list (
check_pass = grepl ("^aa",
checks$checks$starts_with_aa,
ignore.case = TRUE),
summary = "",
print = ""
)
out$summary <- paste0 ("Package name [",
checks$checks$starts_with_aa,
"] does ",
ifelse (out$check_pass,
"",
"NOT"),
" start with 'aa'")
return (out)
}
If we simply define those function in the global workspace of our
current R session, calling pkgcheck() again will automatically detect
those checks and include them in our output:
#>
#> ── pkgstats 9.9 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>
#> ✔ Package name is available
#> ✖ does not have a 'contributing' file.
#> ✔ uses 'roxygen2'.
#> ✔ 'DESCRIPTION' has a URL field.
#> ✔ 'DESCRIPTION' has a BugReports field.
#> ✖ Package has no HTML vignettes
#> ✖ These functions do not have examples: [pkgstats_from_archive].
#> ✔ Package has continuous integration checks.
#> ✖ Package coverage failed
#> ✖ Package name [pkgstats] does NOT start with 'aa'
#> ✖ R CMD check found 1 error.
#> ✔ R CMD check found no warnings.
#>
#> ℹ Current status:
#> ✖ This package is not ready to be submitted.
Customised personal checks can be incorporated by defining them in a
local package, loading that into the workspace, and passing the name of
the package to the extra_env parameter.
3.2 New pkgcheck Checks (for pkgcheck developers)
New checks can be added to this package by creating new files in the
/R directory prefixed with pkgchk_, and including the two functions
described above (a check and an output function). The check name will
then need to be included in the order_checks() function in the
R/summarise-checks.R
file,
which determines the order of checks in the summary output. Checks
which are not defined in this ordering, including any defined via
extra_env parameters, appear after all of the standard checks, and
prior to the R CMD check results which always appear last. This order
may only be modified by editing the list in that function. The order of
check results in the print method is also hard-coded, defined in the
main print.pkgcheck
method.
As explicitly stated in that function, any new checks should also be
included in the print method just after the first reference to
"misc_checks",
via an additional line:
print_check_screen (x, "<name-of-new-check>", pkg_env)
The print_check_screen() function will then automatically activate the
print method of any new checks. This line should be added even if a
new check has no print method (as in the starts_with_aa example
above), to provide an explicit record of all internally-defined
miscellaneous checks.
3.2a Check types
Some checks are defined so that failure results in a 👀 symbol, rather than a default ❌ symbol. This 👀 symbol indicates that the failures may be worth examining further, and yet do not cause the overall check report to fail. This sub-section describes how to define such checks.
All checks include a binary flag, check_pass, defined in their
output_pkgchk_... function, like in the example above, which also
define their output conditions. If out$summary is defined for
check_pass, then that output will by default be prefixed with ✅,
while if out$summary is defined for !check_pass, then that output
will by default be prefixed with ❌. Any instances of ❌ will cause the
whole check suite to fail, including on
pkgcheck-action.
The out value returned from all output_pkgchk_... functions is a
list that must include check_pass, summary, and print items, like
in the example above. Non-default check types can be defined by an
optional extra list-item named check_type, specified as a string with
conditions for "<pass>"_"<fail>", where a value of "watch" will
replace the default ✅ or ❌ symbols with 👀. For example, a check which
should issue ✅ on pass yet 👀 on fail would be specified as,
out$check_type <- "pass_watch"
A check which should only issue 👀 on fail and nothing on success would be specified as,
out$check_type <- "none_watch"
A check which should issue 👀 on success and nothing on failure would be specified as,
out$check_type <- "watch_none"
3.2b Testing new checks
Finally, any new checks also need to be included in tests, with most
checks having a corresponding file in the tests/testthat
directory.
The test suite includes helper functions used to create generic
pkgcheck objects which are then modified for testing within individual
tests. Most checks start with the following lines:
checks <- make_check_data ()
ci_out <- output_pkgchk_<fn-name> (checks)
Data from any newly-added checks will automatically appear in the result
of make_check_data(), and may be tested directly. Most tests
nevertheless only need to test the output of the output_pkgchk_
functions. This is generally done by modifying the checks data
obtained in the initial call to make_check_data(), and then passing
those modified data to the matching output_pkgchk_ function. For
example, in
test-check-scrap.R,
the default value returned from the output function is first tested, and
then the checks value is modified by,
For checks which require either goodpractice or srr (software
review roclets) for statistical
software, initial check values should be constructed with an alternative
helper function, make_check_data_srr(), which accepts an additional
parameter, goodpractice, which can be specified as FALSE (default)
or TRUE to return full goodpractice data. See examples in
test-check-covr.R.
checks$checks$has_scrap <- "scrap"
In this way, all possible forms and modes of each output_pkgchk_
function should be extensively tested. Finally, snapshot results need to
be updated to reflect any additional tests, as does the
test-list-checks.R
file,
which tests the total number of internally-defined checks as
expect_length (ncks, ..). The number tested there also needs to be
incremented by one for each new check.