Title: | Medical Statistics & Epidemiological Analysis |
---|---|
Description: | A set of tidyverse-friendly functions for data management, calculation of epidemiological measures, statistical analysis, and table creation. |
Authors: | Myo Minn Oo <[email protected]> |
Maintainer: | Myo Minn Oo <[email protected]> |
License: | MIT + file LICENSE |
Version: | 3.5.0 |
Built: | 2025-01-21 03:24:50 UTC |
Source: | https://github.com/myominnoo/mstats |
append(...)
append(...)
... |
Data frames to combine. |
append
stacks multiple datasets.
A data frame
Other Data Management:
codebook()
,
count_functions
,
cut()
,
tag_duplicates()
append(airquality, mtcars)
append(airquality, mtcars)
The codebook
function generates a codebook for the given dataset. It provides a summary
of the dataset's structure and characteristics, including variable names, types, missing
values, completeness percentages, unique value counts, and variable labels (if available).
codebook(data)
codebook(data)
data |
The dataset for which the codebook is to be generated. |
The input dataset is returned invisibly,
allowing codebook()
to be used within a data pipe line.
Other Data Management:
append()
,
count_functions
,
cut()
,
tag_duplicates()
codebook(mtcars) codebook(iris) labelled::var_label(iris) <- c( "sepal length", "sepal width", "petal length", "petal width", "species" ) codebook(iris)
codebook(mtcars) codebook(iris) labelled::var_label(iris) <- c( "sepal length", "sepal width", "petal length", "petal width", "species" ) codebook(iris)
The coder()
function applies a set of functions to multiple variables in a data frame using dplyr
's mutate()
and across()
functions.
coder(.data, .fns)
coder(.data, .fns)
.data |
The input data frame. |
.fns |
A set of functions to be applied to the variables in the data frame. |
A modified data frame with the functions applied to the variables.
# Apply the `mean()` function to multiple variables in the `mtcars` data frame coder(mtcars, mean)
# Apply the `mean()` function to multiple variables in the `mtcars` data frame coder(mtcars, mean)
n_(...) N_(...)
n_(...) N_(...)
... |
Columns to pick. You can't pick grouping columns because they are already automatically
handled by the verb (i.e. |
These functions are used for indexing observations or generating sequences of numbers.
n_()
generates a running counter within a group of variables and
represents the number of the current observation.
N_()
provides the total count within each group of variables.
You can do these operations using dplyr::n()
in this way.
See examples below using iris dataset.
iris |> mutate(.N_ = n()) |> head() iris |> mutate(.n_ = 1:n()) |> head() iris |> group_by(Species) |> mutate(.n_ = 1:n()) |> slice(1:5) |> ungroup()
A numeric vector representing the count from n
to N
.
Other Data Management:
append()
,
codebook()
,
cut()
,
tag_duplicates()
# Example with a custom dataset df <- data.frame( x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4), y = letters[1:10] ) library(dplyr) # Generate a running counter for each observation within the "x" group using mutate() mutate(df, n = n_(x)) # Generate a running counter for each observation for all columns using mutate() mutate(df, n = n_(everything())) # Generate the total count of observations using summarise() reframe(df, n = n_(x)) # Generate the total count of observations within the "x" group using summarise() mutate(df, N = N_(everything())) mutate(df, N = N_(x)) reframe(df, N = N_(x)) # iris dataset mutate(iris, n = n_(everything())) mutate(iris, N = N_(everything()))
# Example with a custom dataset df <- data.frame( x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4), y = letters[1:10] ) library(dplyr) # Generate a running counter for each observation within the "x" group using mutate() mutate(df, n = n_(x)) # Generate a running counter for each observation for all columns using mutate() mutate(df, n = n_(everything())) # Generate the total count of observations using summarise() reframe(df, n = n_(x)) # Generate the total count of observations within the "x" group using summarise() mutate(df, N = N_(everything())) mutate(df, N = N_(x)) reframe(df, N = N_(x)) # iris dataset mutate(iris, n = n_(everything())) mutate(iris, N = N_(everything()))
cut(x, at, label = NULL, ...)
cut(x, at, label = NULL, ...)
x |
A numeric vector to be cut into factors. |
at |
A numeric vector specifying the breakpoints or categories
for cutting the vector. If a single value is provided,
the function will create breaks using the same method as |
label |
Optional labels for the resulting factor levels. If not provided, labels will be automatically generated based on the breaks. |
... |
Additional arguments to be passed to |
This function cuts a numeric vector into factors
based on specified breaks or categories.
If the input vector is not numeric, the function delegates
to the base R cut
function.
A factor representing the cut vector with factor levels assigned based on the breaks or categories.
Other Data Management:
append()
,
codebook()
,
count_functions
,
tag_duplicates()
x <- c(1, 2, 3, 4, 5) cut(x, 2)
x <- c(1, 2, 3, 4, 5) cut(x, 2)
The decode()
function decodes selected variables in a data frame. It converts the variables to character type, sets labels from the original data, and returns the modified data frame.
decode(.data, ...)
decode(.data, ...)
.data |
The input data frame. |
... |
The variables to be decoded. |
A modified data frame with the selected variables decoded.
# Decode selected variables in the `mtcars` data frame decode(mtcars, mpg, cyl)
# Decode selected variables in the `mtcars` data frame decode(mtcars, mpg, cyl)
This function was deprecated because we realized that it's
a special case of the cut
function.
egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)
egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)
data |
data.frame |
var |
existing variable |
at |
either a number or a numeric vector |
label |
Labels for the groups |
new_var |
Name of the new variable |
... |
Additional arguments to be passed to |
data <- data.frame(x = 1:10) egen(data, x, at = c(3, 7), label = c("low", "medium", "high")) egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")
data <- data.frame(x = 1:10) egen(data, x, at = c(3, 7), label = c("low", "medium", "high")) egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")
The encode()
function encodes selected variables in a data frame. It converts the variables to numeric type, sets labels from the original data, and returns the modified data frame.
encode(.data, ...)
encode(.data, ...)
.data |
The input data frame. |
... |
The variables to be encoded. |
A modified data frame with the selected variables encoded.
# Encode selected variables in the `mtcars` data frame encode(mtcars, mpg, cyl)
# Encode selected variables in the `mtcars` data frame encode(mtcars, mpg, cyl)
This function manipulates labels. It supports different classes of objects, including default objects, data frames, and other types.
label(x, label = NULL)
label(x, label = NULL)
x |
The object to which the label will be added or modified. |
label |
A character string specifying the label to be assigned to the variable. |
When used with dplyr
's [mutate]
function,
this function allows for easy labeling of variables within a data frame.
If used with a data frame, the function labels the dataset itself, and
the label can be checked using the [codebook]
function.
The modified object with the updated label.
library(dplyr) iris |> mutate(Species = label(Species, 'Species of iris flower')) |> codebook() iris |> label("Iris dataset") |> codebook()
library(dplyr) iris |> mutate(Species = label(Species, 'Species of iris flower')) |> codebook() iris |> label("Iris dataset") |> codebook()
tag_duplicates(..., .add_tags = TRUE)
tag_duplicates(..., .add_tags = TRUE)
... |
Columns to use for identifying duplicates. |
.add_tags |
logical to return three indicator columns: |
This function identifies and tags duplicate observations based on specified variables.
This function mimics the functionality of Stata's duplicates
command in R.
It calculates the number of duplicates and provides a report of duplicates
based on the specified variables. The function utilizes the n_ and N_ functions
for counting and grouping the observations.
A tibble with three columns: .n_
, .N_
, and .dup_
.
.n_
represents the running counter within each group of variables,
indicating the number of the current observation.
.N_
represents the total number of observations within each group of variables.
.dup_
is a logical column indicating
whether the observation is a duplicate (TRUE) or not (FALSE).
Other Data Management:
append()
,
codebook()
,
count_functions
,
cut()
library(dplyr) # Example with a custom dataset data <- data.frame( x = c(1, 1, 2, 2, 3, 4, 4, 5), y = letters[1:8] ) # Identify and tag duplicates based on the "x" variable data %>% mutate(tag_duplicates(x)) # Identify and tag duplicates based on multiple variables data %>% mutate(tag_duplicates(x, y)) # Identify and tag duplicates based on all variables data %>% mutate(tag_duplicates(everything())) ## Not run: ## STATA example dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta") dupxmpl |> mutate(tag_duplicates(everything())) ## End(Not run)
library(dplyr) # Example with a custom dataset data <- data.frame( x = c(1, 1, 2, 2, 3, 4, 4, 5), y = letters[1:8] ) # Identify and tag duplicates based on the "x" variable data %>% mutate(tag_duplicates(x)) # Identify and tag duplicates based on multiple variables data %>% mutate(tag_duplicates(x, y)) # Identify and tag duplicates based on all variables data %>% mutate(tag_duplicates(everything())) ## Not run: ## STATA example dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta") dupxmpl |> mutate(tag_duplicates(everything())) ## End(Not run)