| Title: | Medical Statistics & Epidemiological Analysis |
|---|---|
| Description: | A set of tidyverse-friendly functions for data management, calculation of epidemiological measures, statistical analysis, and table creation. |
| Authors: | Myo Minn Oo <[email protected]> |
| Maintainer: | Myo Minn Oo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 3.5.0 |
| Built: | 2026-06-08 06:47:35 UTC |
| Source: | https://github.com/myominnoo/mstats |
append(...)append(...)
... |
Data frames to combine. |
append stacks multiple datasets.
A data frame
Other Data Management:
codebook(),
count_functions,
cut(),
tag_duplicates()
append(airquality, mtcars)append(airquality, mtcars)
The codebook function generates a codebook for the given dataset. It provides a summary
of the dataset's structure and characteristics, including variable names, types, missing
values, completeness percentages, unique value counts, and variable labels (if available).
codebook(data)codebook(data)
data |
The dataset for which the codebook is to be generated. |
The input dataset is returned invisibly,
allowing codebook() to be used within a data pipe line.
Other Data Management:
append(),
count_functions,
cut(),
tag_duplicates()
codebook(mtcars) codebook(iris) labelled::var_label(iris) <- c( "sepal length", "sepal width", "petal length", "petal width", "species" ) codebook(iris)codebook(mtcars) codebook(iris) labelled::var_label(iris) <- c( "sepal length", "sepal width", "petal length", "petal width", "species" ) codebook(iris)
The coder() function applies a set of functions to multiple variables in a data frame using dplyr's mutate() and across() functions.
coder(.data, .fns)coder(.data, .fns)
.data |
The input data frame. |
.fns |
A set of functions to be applied to the variables in the data frame. |
A modified data frame with the functions applied to the variables.
# Apply the `mean()` function to multiple variables in the `mtcars` data frame coder(mtcars, mean)# Apply the `mean()` function to multiple variables in the `mtcars` data frame coder(mtcars, mean)
n_(...) N_(...)n_(...) N_(...)
... |
Columns to pick. You can't pick grouping columns because they are already automatically
handled by the verb (i.e. |
These functions are used for indexing observations or generating sequences of numbers.
n_() generates a running counter within a group of variables and
represents the number of the current observation.
N_() provides the total count within each group of variables.
You can do these operations using dplyr::n() in this way.
See examples below using iris dataset.
iris |> mutate(.N_ = n()) |> head() iris |> mutate(.n_ = 1:n()) |> head() iris |> group_by(Species) |> mutate(.n_ = 1:n()) |> slice(1:5) |> ungroup()
A numeric vector representing the count from n to N.
Other Data Management:
append(),
codebook(),
cut(),
tag_duplicates()
# Example with a custom dataset df <- data.frame( x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4), y = letters[1:10] ) library(dplyr) # Generate a running counter for each observation within the "x" group using mutate() mutate(df, n = n_(x)) # Generate a running counter for each observation for all columns using mutate() mutate(df, n = n_(everything())) # Generate the total count of observations using summarise() reframe(df, n = n_(x)) # Generate the total count of observations within the "x" group using summarise() mutate(df, N = N_(everything())) mutate(df, N = N_(x)) reframe(df, N = N_(x)) # iris dataset mutate(iris, n = n_(everything())) mutate(iris, N = N_(everything()))# Example with a custom dataset df <- data.frame( x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4), y = letters[1:10] ) library(dplyr) # Generate a running counter for each observation within the "x" group using mutate() mutate(df, n = n_(x)) # Generate a running counter for each observation for all columns using mutate() mutate(df, n = n_(everything())) # Generate the total count of observations using summarise() reframe(df, n = n_(x)) # Generate the total count of observations within the "x" group using summarise() mutate(df, N = N_(everything())) mutate(df, N = N_(x)) reframe(df, N = N_(x)) # iris dataset mutate(iris, n = n_(everything())) mutate(iris, N = N_(everything()))
cut(x, at, label = NULL, ...)cut(x, at, label = NULL, ...)
x |
A numeric vector to be cut into factors. |
at |
A numeric vector specifying the breakpoints or categories
for cutting the vector. If a single value is provided,
the function will create breaks using the same method as |
label |
Optional labels for the resulting factor levels. If not provided, labels will be automatically generated based on the breaks. |
... |
Additional arguments to be passed to |
This function cuts a numeric vector into factors
based on specified breaks or categories.
If the input vector is not numeric, the function delegates
to the base R cut function.
A factor representing the cut vector with factor levels assigned based on the breaks or categories.
Other Data Management:
append(),
codebook(),
count_functions,
tag_duplicates()
x <- c(1, 2, 3, 4, 5) cut(x, 2)x <- c(1, 2, 3, 4, 5) cut(x, 2)
The decode() function decodes selected variables in a data frame. It converts the variables to character type, sets labels from the original data, and returns the modified data frame.
decode(.data, ...)decode(.data, ...)
.data |
The input data frame. |
... |
The variables to be decoded. |
A modified data frame with the selected variables decoded.
# Decode selected variables in the `mtcars` data frame decode(mtcars, mpg, cyl)# Decode selected variables in the `mtcars` data frame decode(mtcars, mpg, cyl)
This function was deprecated because we realized that it's
a special case of the cut function.
egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)
data |
data.frame |
var |
existing variable |
at |
either a number or a numeric vector |
label |
Labels for the groups |
new_var |
Name of the new variable |
... |
Additional arguments to be passed to |
data <- data.frame(x = 1:10) egen(data, x, at = c(3, 7), label = c("low", "medium", "high")) egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")data <- data.frame(x = 1:10) egen(data, x, at = c(3, 7), label = c("low", "medium", "high")) egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")
The encode() function encodes selected variables in a data frame. It converts the variables to numeric type, sets labels from the original data, and returns the modified data frame.
encode(.data, ...)encode(.data, ...)
.data |
The input data frame. |
... |
The variables to be encoded. |
A modified data frame with the selected variables encoded.
# Encode selected variables in the `mtcars` data frame encode(mtcars, mpg, cyl)# Encode selected variables in the `mtcars` data frame encode(mtcars, mpg, cyl)
This function manipulates labels. It supports different classes of objects, including default objects, data frames, and other types.
label(x, label = NULL)label(x, label = NULL)
x |
The object to which the label will be added or modified. |
label |
A character string specifying the label to be assigned to the variable. |
When used with dplyr's [mutate] function,
this function allows for easy labeling of variables within a data frame.
If used with a data frame, the function labels the dataset itself, and
the label can be checked using the [codebook] function.
The modified object with the updated label.
library(dplyr) iris |> mutate(Species = label(Species, 'Species of iris flower')) |> codebook() iris |> label("Iris dataset") |> codebook()library(dplyr) iris |> mutate(Species = label(Species, 'Species of iris flower')) |> codebook() iris |> label("Iris dataset") |> codebook()
tag_duplicates(..., .add_tags = TRUE)tag_duplicates(..., .add_tags = TRUE)
... |
Columns to use for identifying duplicates. |
.add_tags |
logical to return three indicator columns: |
This function identifies and tags duplicate observations based on specified variables.
This function mimics the functionality of Stata's duplicates command in R.
It calculates the number of duplicates and provides a report of duplicates
based on the specified variables. The function utilizes the n_ and N_ functions
for counting and grouping the observations.
A tibble with three columns: .n_, .N_, and .dup_.
.n_ represents the running counter within each group of variables,
indicating the number of the current observation.
.N_ represents the total number of observations within each group of variables.
.dup_ is a logical column indicating
whether the observation is a duplicate (TRUE) or not (FALSE).
Other Data Management:
append(),
codebook(),
count_functions,
cut()
library(dplyr) # Example with a custom dataset data <- data.frame( x = c(1, 1, 2, 2, 3, 4, 4, 5), y = letters[1:8] ) # Identify and tag duplicates based on the "x" variable data %>% mutate(tag_duplicates(x)) # Identify and tag duplicates based on multiple variables data %>% mutate(tag_duplicates(x, y)) # Identify and tag duplicates based on all variables data %>% mutate(tag_duplicates(everything())) ## Not run: ## STATA example dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta") dupxmpl |> mutate(tag_duplicates(everything())) ## End(Not run)library(dplyr) # Example with a custom dataset data <- data.frame( x = c(1, 1, 2, 2, 3, 4, 4, 5), y = letters[1:8] ) # Identify and tag duplicates based on the "x" variable data %>% mutate(tag_duplicates(x)) # Identify and tag duplicates based on multiple variables data %>% mutate(tag_duplicates(x, y)) # Identify and tag duplicates based on all variables data %>% mutate(tag_duplicates(everything())) ## Not run: ## STATA example dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta") dupxmpl |> mutate(tag_duplicates(everything())) ## End(Not run)