Custom function calculating bootstrap interval and running a randomization test

Next we’ll introduce a new function that you’ll be seeing a lot more of in the upcoming labs – a custom function that allows you to apply any statistical inference method that you’ll be learning in this course. Since this is a custom function, we need to load it first.

Writing a for loop every time you want to calculate a bootstrap interval or run a randomization test is cumbersome. This function automates the process. By default the inference function takes 10,000 bootstrap samples (instead of the 100 you’ve taken above), creates a bootstrap distribution, and calculates the confidence interval, using the percentile method.

load(url("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/inference.Rdata"))inference

inference(nc$gained, type="ci", method="simulation", conflevel=0.9, est="mean", boot_method="perc")

 

Next, we’ll use the inference function for evaluating whether there is a difference between the average birth weights of babies born to smoker and non-smoker mothers.

Let’s pause for a moment to go through the arguments of this custom function:

  • The first argument is y, which is the response variable that we are interested in: nc$weight.
  • The second argument is the grouping variable, x, which is the explanatory variable – the grouping variable across the levels of which we’re comparing the average value for the response variable, smokers and non-smokers: nc$habit.
  • The third argument, est, is the parameter we’re interested in: "mean" (other options are "median", or "proportion".)
  • Next we decide on the type of inference we want: a hypothesis test ("ht") or a confidence interval("ci").
  • When performing a hypothesis test, we also need to supply the null value, which in this case is 0, since the null hypothesis sets the two population means equal to each other.
  • The alternative hypothesis can be "less", "greater", or "twosided".
  • Lastly, the method of inference can be "theoretical" or "simulation" based.

 

By default the inference function sets the parameter of interest to be (μnonsmokerμsmoker). We can easily change this order by using the order argument.

To set the order to μfirstμsecond use: order = c("first","second").

inference(y = nc$weight, x = nc$habit, est = “mean”, type = “ht”, null = 0, alternative = “twosided”, method = “theoretical”, order=c(“smoker”, “nonsmoker”))

 

Source: https://www.datacamp.com/courses/data-analysis-and-statistical-inference_mine-cetinkaya-rundel-by-datacamp

Advertisements