Package 'mStats' reference manual

Title:	Medical Statistics & Epidemiological Analysis
Description:	A set of tidyverse-friendly functions for data management, calculation of epidemiological measures, statistical analysis, and table creation.
Authors:	Myo Minn Oo <[email protected]>
Maintainer:	Myo Minn Oo <[email protected]>
License:	MIT + file LICENSE
Version:	3.5.0
Built:	2025-03-22 03:44:10 UTC
Source:	https://github.com/myominnoo/mstats

Append datasets

Description

Usage

append(...)
append(...)

Arguments

...

Data frames to combine.

Details

append stacks multiple datasets.

Value

A data frame

Examples

append(airquality, mtcars)

append(airquality, mtcars)

The codebook function generates a codebook for the given dataset. It provides a summary of the dataset's structure and characteristics, including variable names, types, missing values, completeness percentages, unique value counts, and variable labels (if available).

Usage

codebook(data)
codebook(data)

Arguments

data

The dataset for which the codebook is to be generated.

Value

The input dataset is returned invisibly, allowing codebook() to be used within a data pipe line.

Examples

codebook(mtcars)

codebook(iris)

labelled::var_label(iris) <- c(
	"sepal length", "sepal width", "petal length",
	"petal width", "species"
)
codebook(iris)

codebook(mtcars)

codebook(iris)

labelled::var_label(iris) <- c(
	"sepal length", "sepal width", "petal length",
	"petal width", "species"
)
codebook(iris)

Apply functions to multiple variables in a data frame

Description

The coder() function applies a set of functions to multiple variables in a data frame using dplyr's mutate() and across() functions.

Usage

coder(.data, .fns)
coder(.data, .fns)

Arguments

`.data`	The input data frame.
`.fns`	A set of functions to be applied to the variables in the data frame.

Value

A modified data frame with the functions applied to the variables.

Examples

# Apply the `mean()` function to multiple variables in the `mtcars` data frame
coder(mtcars, mean)

# Apply the `mean()` function to multiple variables in the `mtcars` data frame
coder(mtcars, mean)

Count from n to N

Description

Usage

n_(...)

N_(...)
n_(...)

N_(...)

Arguments

...

<tidy-select>

Columns to pick.

You can't pick grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate()).

Details

These functions are used for indexing observations or generating sequences of numbers.

n_() generates a running counter within a group of variables and represents the number of the current observation.
N_() provides the total count within each group of variables.

You can do these operations using dplyr::n() in this way. See examples below using iris dataset.

Value

A numeric vector representing the count from n to N.

Examples


# Example with a custom dataset
df <- data.frame(
  x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4),
  y = letters[1:10]
)

library(dplyr)

# Generate a running counter for each observation within the "x" group using mutate()
mutate(df, n = n_(x))

# Generate a running counter for each observation for all columns using mutate()
mutate(df, n = n_(everything()))

# Generate the total count of observations using summarise()
reframe(df, n = n_(x))

# Generate the total count of observations within the "x" group using summarise()
mutate(df, N = N_(everything()))
mutate(df, N = N_(x))
reframe(df, N = N_(x))

# iris dataset
mutate(iris, n = n_(everything()))
mutate(iris, N = N_(everything()))
# Example with a custom dataset
df <- data.frame(
  x = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4),
  y = letters[1:10]
)

library(dplyr)

# Generate a running counter for each observation within the "x" group using mutate()
mutate(df, n = n_(x))

# Generate a running counter for each observation for all columns using mutate()
mutate(df, n = n_(everything()))

# Generate the total count of observations using summarise()
reframe(df, n = n_(x))

# Generate the total count of observations within the "x" group using summarise()
mutate(df, N = N_(everything()))
mutate(df, N = N_(x))
reframe(df, N = N_(x))

# iris dataset
mutate(iris, n = n_(everything()))
mutate(iris, N = N_(everything()))

Cut numeric vector into factor vector

Description

Usage

cut(x, at, label = NULL, ...)
cut(x, at, label = NULL, ...)

Arguments

`x`	A numeric vector to be cut into factors.
`at`	A numeric vector specifying the breakpoints or categories for cutting the vector. If a single value is provided, the function will create breaks using the same method as `⁠[base::cut]⁠`. If multiple values are provided, they are treated as specific breaks.
`label`	Optional labels for the resulting factor levels. If not provided, labels will be automatically generated based on the breaks.
`...`	Additional arguments to be passed to `⁠[base::cut]⁠` if `x` is not numeric.

Details

This function cuts a numeric vector into factors based on specified breaks or categories. If the input vector is not numeric, the function delegates to the base R cut function.

Value

A factor representing the cut vector with factor levels assigned based on the breaks or categories.

Examples

x <- c(1, 2, 3, 4, 5)
cut(x, 2)
x <- c(1, 2, 3, 4, 5)
cut(x, 2)

Decode variables in a data frame

Description

The decode() function decodes selected variables in a data frame. It converts the variables to character type, sets labels from the original data, and returns the modified data frame.

Usage

decode(.data, ...)
decode(.data, ...)

Arguments

`.data`	The input data frame.
`...`	The variables to be decoded.

Value

A modified data frame with the selected variables decoded.

Examples

# Decode selected variables in the `mtcars` data frame
decode(mtcars, mpg, cyl)

# Decode selected variables in the `mtcars` data frame
decode(mtcars, mpg, cyl)

Convert a continuous variable into groups

Description

This function was deprecated because we realized that it's a special case of the cut function.

Usage

egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)
egen(data, var, at = NULL, label = NULL, new_var = NULL, ...)

Arguments

`data`	data.frame
`var`	existing variable
`at`	either a number or a numeric vector
`label`	Labels for the groups
`new_var`	Name of the new variable
`...`	Additional arguments to be passed to `cut`

Examples

data <- data.frame(x = 1:10)
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"))
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")


data <- data.frame(x = 1:10)
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"))
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"), new_var = "group")

Encode variables in a data frame

Description

The encode() function encodes selected variables in a data frame. It converts the variables to numeric type, sets labels from the original data, and returns the modified data frame.

Usage

encode(.data, ...)
encode(.data, ...)

Arguments

`.data`	The input data frame.
`...`	The variables to be encoded.

Value

A modified data frame with the selected variables encoded.

Examples

# Encode selected variables in the `mtcars` data frame
encode(mtcars, mpg, cyl)

# Encode selected variables in the `mtcars` data frame
encode(mtcars, mpg, cyl)

Attach labels to data and variables

Description

This function manipulates labels. It supports different classes of objects, including default objects, data frames, and other types.

Usage

label(x, label = NULL)
label(x, label = NULL)

Arguments

`x`	The object to which the label will be added or modified.
`label`	A character string specifying the label to be assigned to the variable.

Details

When used with dplyr's ⁠[mutate]⁠ function, this function allows for easy labeling of variables within a data frame.

If used with a data frame, the function labels the dataset itself, and the label can be checked using the ⁠[codebook]⁠ function.

Value

The modified object with the updated label.

Examples


library(dplyr)

iris |>
	 mutate(Species = label(Species, 'Species of iris flower')) |>
	 codebook()

iris |>
	 label("Iris dataset") |>
	 codebook()

library(dplyr)

iris |>
	 mutate(Species = label(Species, 'Species of iris flower')) |>
	 codebook()

iris |>
	 label("Iris dataset") |>
	 codebook()

Tag Duplicates

Description

Usage

tag_duplicates(..., .add_tags = TRUE)
tag_duplicates(..., .add_tags = TRUE)

Arguments

`...`	Columns to use for identifying duplicates.
`.add_tags`	logical to return three indicator columns: `.n_`, `.N_`, and `.dup_`.

Details

This function identifies and tags duplicate observations based on specified variables.

This function mimics the functionality of Stata's duplicates command in R. It calculates the number of duplicates and provides a report of duplicates based on the specified variables. The function utilizes the n_ and N_ functions for counting and grouping the observations.

Value

A tibble with three columns: .n_, .N_, and .dup_.

.n_ represents the running counter within each group of variables, indicating the number of the current observation.
.N_ represents the total number of observations within each group of variables.
.dup_ is a logical column indicating whether the observation is a duplicate (TRUE) or not (FALSE).

Examples


library(dplyr)

# Example with a custom dataset
data <- data.frame(
  x = c(1, 1, 2, 2, 3, 4, 4, 5),
  y = letters[1:8]
)

# Identify and tag duplicates based on the "x" variable
data %>% mutate(tag_duplicates(x))

# Identify and tag duplicates based on multiple variables
data %>% mutate(tag_duplicates(x, y))

# Identify and tag duplicates based on all variables
data %>% mutate(tag_duplicates(everything()))

## Not run: 
## STATA example
dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta")
dupxmpl |> mutate(tag_duplicates(everything()))

## End(Not run)

library(dplyr)

# Example with a custom dataset
data <- data.frame(
  x = c(1, 1, 2, 2, 3, 4, 4, 5),
  y = letters[1:8]
)

# Identify and tag duplicates based on the "x" variable
data %>% mutate(tag_duplicates(x))

# Identify and tag duplicates based on multiple variables
data %>% mutate(tag_duplicates(x, y))

# Identify and tag duplicates based on all variables
data %>% mutate(tag_duplicates(everything()))

## Not run: 
## STATA example
dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta")
dupxmpl |> mutate(tag_duplicates(everything()))

## End(Not run)

Package 'mStats'

Help Index

Append datasets

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate a codebook

Description

Usage

Arguments

Value

See Also

Examples

Apply functions to multiple variables in a data frame

Description

Usage

Arguments

Value

Examples

Count from n to N

Description

Usage

Arguments

Details

Value

See Also

Examples

Cut numeric vector into factor vector

Description

Usage

Arguments

Details

Value

See Also

Examples

Decode variables in a data frame

Description

Usage

Arguments

Value

Examples

Convert a continuous variable into groups

Description

Usage

Arguments

See Also

Examples

Encode variables in a data frame

Description

Usage

Arguments

Value

Examples

Attach labels to data and variables

Description

Usage

Arguments

Details

Value

Examples

Tag Duplicates

Description

Usage

Arguments

Details

Value

See Also

Examples