最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

r - My call to glm in a function does not find the formula in the environment - Stack Overflow

matteradmin6PV0评论

In the following example, I create a function to fit a glm, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:

n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df

fun1 <- function(mod, pTrain = 0.5){
  print(environment())
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

fun2 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  # environment(fmla) <- environment() # does not help
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found

If I use a <<- assignment for the formula, the function works, but I worry about the potential issues with this:

fun3 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2

In the following example, I create a function to fit a glm, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:

n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df

fun1 <- function(mod, pTrain = 0.5){
  print(environment())
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

fun2 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  # environment(fmla) <- environment() # does not help
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found

If I use a <<- assignment for the formula, the function works, but I worry about the potential issues with this:

fun3 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2
Share Improve this question edited Nov 16, 2024 at 7:06 Jan 10.3k6 gold badges21 silver badges33 bronze badges asked Nov 16, 2024 at 5:51 Marc in the boxMarc in the box 12k5 gold badges49 silver badges100 bronze badges 2
  • 1 I don’t think formula takes a text argument. If you want to build a formula then as.formula is the standard approach. Voting to close as a typo. – IRTFM Commented Nov 16, 2024 at 6:46
  • Hmmm. It also works if you pass fmla as an argument to fun1... – Limey Commented Nov 16, 2024 at 6:58
Add a comment  | 

2 Answers 2

Reset to default 2

Inspired by this post - in particular, the answer that has not been accepted - this seems to solve the problem.

fun1 <- function(mod, pTrain = 0.5){
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  # New code
  ev <- environment()
  parent.env(ev) <- environment(mod$formula)
  environment(mod$formula) <- ev
  # End of new code
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

I cannot explain why, though the discussion in the accepted answer above is probably worth some study.

As I mentioned in my comment, amending the signature of fun1 to

fun1 <- function(mod, pTrain = 0.5, fmla)

and the call to it in fun2 to

  res <- fun1(mod, pTrain = 0.5, fmla)

also succeeds.

Replace

mod <- glm(formula = fmla, data = data)

with

mod <- do.call("glm", list(formula = fmla, data = data))

or alternately with (less preferred)

mod <- eval(substitute(glm(fmla, data = data), list(fmla = fmla)))

so that the value of fmla is passed to glm rather than the fmla variable. Normally this would not matter but glm uses non-standard evaluation (NSE) so it does.

Post a comment

comment list (0)

  1. No comments so far