Assignment 2 - 2012

 

Assignment objectives

=
========

.. rubric:: Assignment objectives:


 * Use a table of normal distributions to calculate probabilities
 * Summarizing data my means and standard deviations, and their robust equivalent
 * Ability to downloaded data and analyze it

Question 1 [2]

=
=

Estimate the following:


 * 1) .	Without using tables or a computer: the cumulative area under the normal distribution between 15 and 35, with mean of 25 and standard deviation of 5.
 * 2) .	The same as part 1, but using a table of normal distributions from the course notes (or another statistics textbook).
 * 3) .	Between which lower and upper bounds will we find 60% probability of an event occurring, using the standardized (:math:`z`) normal distribution? Calculate your answer using a printed table, ensuring that the two bounds are symmetrical about zero.
 * 4) .	Convert these dimensionless :math:`z`-bounds to real-world bounds for a process with mean of 100 kg and a standard deviation of 25 kg.
 * 5) .	Verify your previous two answers using R, or other computer software.

Question 2 [3]

=
=

A chicken facility produces bags filled with breaded chicken strips. The advertised weight for each package is 750 grams. Each bag contains between 8 and 15 strips, given that each chicken strip is between 40 an 80 grams and from a uniform distribution. The company sets their target fill weight at 790 grams to avoid breaking regulations that require an accurate package labelling.


 * 1) .	If we take a large sample of bagged chicken strips and weigh each bag, from which distribution will we expect these weights to come from?
 * 2) .	Clearly explain why.
 * 3) .	If the standard deviation of this large sample of bag weights is 12 grams, out of 10,000 customers, how many will purchase bags below the advertised 750g weight?

Question 3 [3]

=
=


 * 1) .	Compute the mean, median, standard deviation and MAD for salt content of various potato chips `in this report `_ (page 22) as described in the the article from the `Globe and Mail `_ on 24 September 2009.


 * 1) .	Plot a boxplot of the data and report the interquartile range (IQR). Comment on the 3 measures of spread you have calculated: standard deviation, MAD, and interquartile range.


 * 1) .	Comment on the effectiveness of the visualization plots used in the PDF report.

Question 4 [4]

=
=

Data `characterizing 200 commuting trips of your instructor `_ was visualized in the previous assignment.


 * 1) .	Plot a histogram of the ``TotalTime`` variable (the total time for the commute) to confirm the variable is not normally distributed.


 * 1) .	How would you characterize the distribution of the ``TotalTime`` variable? Give reasons *why* the variable is not normally distributed.


 * 1) .	Confirm the variable is not normally distributed by using a suitable, visual statistical test.


 * 1) .	The 407 highway speeds are almost always much faster than the 403. Does the ``MaxSpeed`` variable (the maximum speed recorded during the entire trip, usually while travelling the 407) follow a normal distribution. Plot both a histogram and a q-q plot to check.

Question 5 [3]

=
=

In this question we investigate the stock prices for the Canadian National Railway Company (ticker ``CNR`` on the Toronto Stock Exchange).


 * Visit http://finance.yahoo.com/
 * Type in ``CNR.TO`` in the symbol (ticker) box
 * Click **Historical Prices** in the left column
 * Change the date range from 01 March 2011 to 01 January 2012
 * Click **Get Prices** to get the "Daily" prices of the stock
 * Scroll to the bottom of the page and click "Download to spreadsheet" to download a CSV file

Once you have loaded the CSV file into R, answer the following questions regarding the ``Adj.Close`` column (the price at which stock closes at end of the trading day, after adjusted it for stock splits and dividends paid)


 * 1) .	Are these closing prices from a normal distribution? Test your answer with a q-q plot.
 * 2) .	Estimate the distribution's location and spread, assuming the data are from a normal distribution. 600-level students must use the ``fitdistr`` function in R from the MASS package.
 * 3) .	Are these data points independent?
 * 4) .	What is the probability of observing a stock value above $ 77.00 ?


 * Note**: the purpose of this exercise is more for you to become comfortable with web-based data retrieval, which is common in most companies.

.. raw:: latex

\vspace{0.5cm} \hrule \begin{center}END\end{center}