## 五年專注**珀斯代寫essay** 信譽保證

turnitin檢測 保證原創率 高分通過

本公司成立以來，在**代寫essay**領域獲得了不錯的口碑，98%以上的客戶順利通過..歡迎大家進行咨詢和享受公司為你提供的全方位服務！不論你的essay有多難，deadline有多急，我們將給你帶來最專業可靠的**代寫essay**服務。

# R Computing Project 作業代寫

STA2ABS/AMS R Computing Project

Due Friday the 31st of May 2013 no later than 5pm

• You will have to submit a script file containing all your solutions to the project. This includes all the R

commands and worded answers. For worded answers, use the # symbol to distinguish it from your R

command answers.

R Computing Project 作業代寫

R Computing Project 作業代寫

• Please save the script using the format: RProject Familyname Studentnumber.R

(e.g. RProject Smith 12345678.R). Make sure you end the file name with .R so that the file is associ-

ated with the R software.

R Computing Project 作業代寫

R Computing Project 作業代寫

• Email your script to J.Zhang@latrobe.edu.au with the subject line specifying your computer

laboratory time e.g. R Project submission - Friday 11pm.

• In submitting your work, you are consenting that it may be copied and transmitted by the University

for the detection of plagiarism. At the start of your script file please type the following statement of

originality, “This is my own work. I did not copy any of it from anyone else”. Also type your name and

student number together with your computer lab time underneath the statement of originality.

1. The R package includes many useful functions for generating various types of random data.

(a) Using the help file for the function rnorm, explain in words how you could use rnorm to randomly

generate 10 numbers from a normal distribution with mean 25 and standard deviation 3.

(b) What are the optional arguments for rnorm and what are their default values?

(c) Randomly generate 10 numbers from a normal distribution with mean 25 and standard deviation 3 and

store these in a vector variable called ten.random.numbers. What commands did you use and

what were the numbers generated?

R Computing Project 作業代寫

R Computing Project 作業代寫

2. Consider the following data that consists of a group classification (Treatment or Control), weight (in kilo-

grams) and a numeric response to a new drug for 7 patients.

Patient 1 2 3 4 5 6 7

Group Control Treatment Treatment Treatment Control Treatment Control

Weight 59 90 47 106 85 73 61

Response 0.0 0.8 0.1 0.1 0.7 0.6 0.2

Table 1: Patient records data.

(a) Provide R commands that create a data frame called patient.records that contains the group

classification, weight and drug response information for the seven patients where each of these vari-

ables is named appropriately within the data frame and are of the appropriate type (i.e. numeric,

character etc). Carry this out in R.

(b) With reference to patient.records only, give two distinct one-line R commands that will display

just the weight of the patients.

(c) With reference to patient.records only, give a simple one-line R command that displays the

group, weight and drug response for just the fourth patient.

(d) Create a list called patient.data that includes the patient.records data frame, the average

weight of the patients and the average drug response of the patients. Objects within the list should be

given appropriate names.

1

3. A famous formula that can be used to roughly estimate the Blood Alcohol Concentration (BAC) percentage

is the Widmark formula given as

BAC% =

? Ounces × 5.4 × ADR

2.2 × Weight

?

− 0.015 × Hours

where

R Computing Project 作業代寫

R Computing Project 作業代寫

• ‘Ounces’ is the liquid ounces of alcohol consumed.

• ‘ADR’ is the alcohol distribution ratio. This is equal to 0.73 for males and 0.66 for females.

• ‘Weight’ is the weight (kg) of individual whose BAC is to be estimated.

• ‘Hours’ is the time in hours since the first drink.

(a) Suppose we wish to estimate the BAC% of a male who weighs 85kg and who has consumed 3.1 liquid

ounces of alcohol over the past 1.5 hours. Provide R commands that

i. Assign appropriate values to the R objects Ounces, Weight, ADR and Hours that will be used

to estimate the BAC% for this individual.

ii. Use these R objects and the Widmark formula to estimate the BAC%.

What is the estimated BAC% for this person?

R Computing Project 作業代寫

R Computing Project 作業代寫

(b) Now suppose that this person has consumed the same amount of alcohol over 2 hours. By changing

just the value you assigned to Hours in your script, what is the estimated BAC% now?

4. An evolutionary biologist examined the relative fitness of Escherichia coli bacteria evolved for 300 days

at stressful acidic pH level 5.5 and their parental generation, evolved at neutral pH level 7.2. Both types

were later grown together in an acidic medium and their relative fitness was computed. The experiment was

replicated with 10 different lines of Escherichia coli giving the following fitness values

1.08,0.98,0.89,1.22,1.07,1.10,1.15,1.04,1.00,1.09.

We assume these values are sampled from a normal distribution. A relative fitness of 1 indicates that both

the acidic and neutrally evolved line are equally fit when both are later grown in acidic conditions. A relative

fitness larger than 1 indicates that the acidic-evolved line is more fit than the neutrally-evolved line when

both are later grown in acidic conditions (that is, the acid-evolved bacteria grew the most). The evolutionary

biologist claims that acidic-evolved bacteria are better adapted to acidic conditions? 1

(a) Let µ denote the mean relative fitness between acidic-evolved bacteria and neutrally-evolved bacteria

when both are later grown in acidic conditions. Suppose we want to test the claim made by the

biologist. State the null and alternative hypothesis for this test.

(b) Store the sampled relative fitness values in a numeric vector object named Rel Fit.

(c) Use R to calculate the sample mean and sample standard deviation of the relative fitness values and

assign these to R objects named x bar and st d, respectively.

(d) Use the objects that have been assigned the sample mean and sample standard deviation to calculate

the observed test statistic for the hypothesis test stated in part (a).

(e) Use R to carry out the hypothesis test stated in part (a). What is the p-value for this test? Also, verify

from the R output that your calculation for the observed test statistic calculated in part (d) is correct.

(f) Based on your R output for the hypothesis test you conducted in part (e) make an appropriate conclu-

sion for this test, at the 5% level of significance.

2

R Computing Project 作業代寫

R Computing Project 作業代寫

5. Let x 1 ,...,x n denote a sample of n observations where x and s 2 denote the sample mean and sample

variance respectively. Also suppose that we are interested in testing the hypotheses

H 0 : µ = µ 0 versus H 1 : µ 6= µ 0

where µ is the population mean from which the data is sampled.

If x 1 ,...,x n are sampled from a normal distribution, then we may test the hypotheses using a t-test. Fur-

thermore, if we nominate a significance level of α = 0.05 then the probability that H 0 will be rejected when

it is in fact true is 0.05. Consider the following R code:

n<-20

R Computing Project 作業代寫

R Computing Project 作業代寫

x<-rnorm(n,mean=25)

p.value<-t.test(x,mu=25)$p.value

p.value

This code randomly generates 20 observations from the N(25,1) distribution and obtains a p-value for the

test of H 0 : µ = 25 versus H 1 : µ 6= 25 where we reject H 0 if the p-value is less than 0.05. Note here that

the population mean is in fact µ = 25 so that H 0 is true. This means that the probability of us rejecting H 0

is 0.05.

Another way of looking at this is as follows. If we were to repeat this process many times, then H 0 would

be rejected around 5% of the time. That is, if we repeated this 1000 times then we would expect H 0 to be

rejected around 0.05 × 1000 = 50 times.

An important question arises. What if the data is not sampled from a normal distribution? For exam-

ple, suppose that the 20 observations are sampled from the F 5,10 distribution such that the command

x<-rf(n,df1=5,df2=10)

is used instead of x<-rnorm(n,mean=25). The mean of a F ν 1 ,ν 2 random variable is equal to

ν 2

(ν 2 −2)

so that the true population mean is µ = 10/(10 − 2) = 1.25 and we are interested in the hypotheses

H 0 : µ = 1.25 versus H 1 : µ 6= 1.25. If the t-test is appropriate and we used the commands

p.value<-t.test(x,mu=1.25)$p.value

p.value

to obtain a p-value, then in the long run we would expect to reject H 0 around 5% of the time.

If n is large then the t-test may be used regardless of the underlying distribution from which the data

was sampled. Another important question now arises. When is n large enough such that we can use the t

test appropriately? For this project you are required to use an R script to simulate the effectiveness of the

t-test. A for loop will be used to carry out the following 2000 times:

• Sample n observations from the F 5,ν 2 distribution.

• Obtain a p-value for the t-test carried out on the sampled data which tests the hypotheses

H 0 : µ =

ν 2

ν 2 − 2

versus H 1 : µ 6=

ν 2

ν 2 − 2 .

• Check whether or not H 0 is to be rejected.

Your script needs to give the proportion of times that H 0 was rejected so that you can check to see whether

it was close to expected (i.e. close to the nominated significance level of 0.05).

3

(a) Write a script file that may be used to carry out this simulation for n = 5, ν 1 = 5 and ν 2 = 10. For

this question you will be assessed on (i) whether the script can carry out the simulation correctly (ii)

clarity (i.e. whether it is easy to read and follow including the use of indentation within the code where

appropriate) and whether appropriate object names were used and (iii) on how easy it is to change the

script file so that the simulation can be carried out for different values for n and ν 2 (the fewer changes

the better).

(b) Carry out the simulation for n = 5 and ν 2 = 10 and enter the proportion of times H 0 was rejected

into the appropriate spot in Table 2 which can be found in the Word document named Table 2. Is

the proportion of times H 0 was rejected close to what we would expect if the t-test is appropriate?

Explain.

R Computing Project 作業代寫

R Computing Project 作業代寫

(c) Complete the rest of Table 2 where you will need to make changes to your script to account for the

choices of n and ν 2 required. Remember that (i) ν 1 = 5 throughout, and (ii) the value assigned to

the argument mu within the t.test function will change for each choice of ν 2 . (Important: The

values you place in the table should go up to 4 decimal places. You need to submit the Table 2 Word

document at the same time you submit your script file that contains all your solutions to the project.)

(d) By comparing the proportion of times you expected H 0 to be rejected (if the t-test is appropriate) with

the proportion of times H 0 was actually rejected, assess the effectiveness of the t-test (with respect to

a level of significance chosen to be 0.05) when data is sampled from a F 5,ν 2 distribution. Your answer

should discuss the effect of changing n and ν 2 and make use of your findings reported in Table 2.

1. Question 4 is adapted from Example 17.41, pp. 454-455 of B ALDI , B., & M OORE , D.S. (2009). The practice of

statistics in the life sciences. Freeman, New York.

4