Next we’ll introduce a new function that you’ll be seeing a lot more of in the upcoming labs – a

custom functionthat allowsyou to apply any statistical inference methodthat you’ll be learning in this course. Since this is a custom function, we need to load it first.Writing

a for loop every time you want to calculate a bootstrap interval or run a randomization testis cumbersome.Thisfunction automates the process.By defaultthe`inference`

function takes 10,000 bootstrap samples(instead of the 100 you’ve taken above),creates a bootstrap distribution, andcalculates the confidence interval, using thepercentilemethod.

`load(url("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/inference.Rdata"))inference`

`inference(nc$gained, type="ci", method="simulation", conflevel=0.9, est="mean", boot_method="perc")`

Next, we’ll use the

`inference`

function for evaluating whether there is a difference between the average birth weights of babies born to smoker and non-smoker mothers.Let’s pause for a moment to go through the arguments of this custom function:

- The first argument is
`y`

, which is theresponse variablethat we are interested in:`nc$weight`

.- The second argument is the
grouping variable,– the grouping variable across the levels of which we’re comparing the average value for the response variable, smokers and non-smokers:`x`

, which is the explanatory variable`nc$habit`

.- The third argument,
`est`

, is the parameter we’re interested in:`"mean"`

(other options are`"median"`

, or`"proportion"`

.)- Next we decide on the
`type`

of inference we want: a hypothesis test (`"ht"`

) or a confidence interval(`"ci"`

).- When performing a hypothesis test, we also need to supply the
`null`

value, which in this case is`0`

, since the null hypothesis sets the two population means equal to each other.- The
`alternative`

hypothesis can be`"less"`

,`"greater"`

, or`"twosided"`

.- Lastly, the
`method`

of inference can be`"theoretical"`

or`"simulation"`

based.

By defaultthe`inference`

function sets the parameter of interest to be(μnonsmoker−μsmoker). We can easily change this order by using the order argument.To set the order to

μfirst−μseconduse:`order = c("first","second")`

.inference(y = nc$weight, x = nc$habit, est = “mean”, type = “ht”, null = 0, alternative = “twosided”, method = “theoretical”, order=c(“smoker”, “nonsmoker”))

# Custom function calculating bootstrap interval and running a randomization test

Advertisements