The egen
function is
used to convert a continuous variable into groups by discretizing it
into intervals. It is a deprecated function that has been replaced by
the cut
function from the mStats
package. The
main difference between egen
and cut
is the
input they accept.
egen
works with data frames or tibbles, allowing
variable grouping within the context of the entire dataset.cut
operates on a vector, performing grouping directly
on that vector.The egen
function is deprecated and serves as a wrapper
around the cut function. It issues a deprecation warning indicating that
the recommended approach is to use cut directly.
library(mStats)
data <- data.frame(x = 1:10)
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"))
#> Warning: `egen()` was deprecated in mStats 3.4.0.
#> ℹ Please use `cut()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> x
#> 1 low
#> 2 low
#> 3 medium
#> 4 medium
#> 5 medium
#> 6 medium
#> 7 high
#> 8 high
#> 9 high
#> 10 high
egen
versus mutate
+ cut
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Example 1: Using egen() function
data <- data.frame(x = 1:10)
data <- egen(data, var = x, at = c(3, 7), label = c("low", "medium", "high"))
# Example 2: Using mutate() and cut() functions
data2 <- data.frame(x = 1:10)
data2 <- mutate(data2, x = cut(x, at = c(-Inf, 3, 7, Inf), label = c("low", "medium", "high")))
# Check if the results are the same
identical(data, data2) # Should be TRUE
#> [1] TRUE
In both examples, a data frame data
and
data2
with a single variable x
is created. The
goal is to group the values of x
into three categories:
“low”, “medium”, and “high”, based on the specified breakpoints.