# Probabilities and Combinatorics

## Combinatorics

• Q**

(a) You analyze a small protein and you find that it consists of 5 different amino acids, but you do not know in which order they appear. How many possible ways are there to order this 5 amino acid-protein? (b) You analyze a small protein and find that it consists of 10 amino acids, 5 of which are cysteine and the other 5 are all distinct. How many possible ways are there to build up this protein? (c) How many distinct (order counts) 5 amino acid proteins can be produced when choosing from the 20 standard amino acids? (d) How many distinct (order counts) 5 amino acid proteins can be produced when choosing from the 20 standard amino acids and all 5 should be distinct?

• A**

(a) Let's suppose there are 5 positions, a, b c d and e. At position a, how many possible acids can we put in? 5 of them. After you picked one for position a, how many possible choices for position b? 4 of them (because one is picked). Then having picked two for position a and b, how many for position c? 3. Answer = 5! = 5*4*3*2*1

(b) This question is analogous to the previous part except for the repeated acids. If all ten acids are distinct, as in part a, then it would have 10! choices. However, having five acids being equal means the five of them are interchangeable, or in other words the order of the five doesn't matter. The five cysteine acids can be arranged in 5! different combinations, ie, they are counted 5! times when we arranged the 10 acids. Thus the answer is 10!/5!, or P(10, 5).

(c) 205, because each amino acid has equal chance of getting into one of the 5 slots.

(d) This is permutation P(20, 5) = 20!/(20-5)!

• Q** A random walker, who takes a step to the right with probability p and a step to the left with probability q (p + q = 1) takes n steps. What is the variance and standard deviation of the position of the random walker after n steps?

(Hint: Find the second moment first. Use the fact that the variance of the binomial distribution is npq and the mean is np.)

• A** Notice that the walker either goes right or left, and this is a binary outcome. His movement resembles a binomial distribution. As given in the hint, the variance for a binomial distribution is npq. Done.
• Q** Each day, the price of a stock either rises one dollar with probability p or decreases one dollar with probability q (p + q = 1). At the beginning, the stock price is 100 dollars.

A) What is the probability that the stock price will be 100 dollars after 95 days? B) What is the probability that the stock price will be 100 dollars after 8 days? C ) Suppose I have one share of this stock. What is the probability that my stock has a higher value than original price after 97 days

• A** A) As it has to go up or down, it's impossible for the stock to go back to its original price in the odd-numbered days. Probability = 0.

B) To go back to the original stock value in 8 days, we need exactly 4 increase-days and 4 decrease-days. Using the formula for binomial distribution P(x = k) = C(n, k) pk(1-p)n-k, we have P(x = 4) = C(8, 4) p4 (1-p)4.

C) A long approach is to calculate the probability of having 49 increase-days, 50 increase-days, etc, up to 97 increase-days, using the formula above, and then add up the probabilities. Another approach is to estimate with a normal distribution. The cdf of a normal distribution N(u, s2.) is defined by its mean u and variance s2 (in standard notation it's usually written as N(\mu, \sigma2). The mean of the binomial is given by u = np, and the variance is s = npq. Having that defined, we can use the formula of cdf to find the probability that we have more than 48 increase-days. Let k be the number of increase-days, then P(k > 48) = 1 - P(k <= 48), where P(k <=48) is given by the formula of the cdf of normal distribution N(u, s2.). This formula is readily available [online.]

## Conditional Probabilities

• Q** Consider a country-wide screening of all women above a certain age for breast cancer. The following facts are known: Roughly 1% of the women have breast cancer (p(bc) = 0.01). If a woman has breast cancer, the screening will detect the cancer in 90% of the cases (p(test pos|bc) = 0.9). If a woman does not have breast cancer, the screening still has a chance of 9% to produce a positive result (the false positive rate p(test pos|NOT bc) = 0.09). You are the doctor conducting the screening and for one of your patients the test turns out positive. What do you tell her about how likely it is that she has breast cancer?
• A** We are asked to find the following conditional probability P(bc |test pos). With conditional probability, we know that P(bc|pos) = P(bc and pos)/P(pos) = P(bc and pos)/[P(bc and pos) + P(~bc and pos)] = 0.01*0.9/[0.01*0.9 + 0.99*0.09] = 0.0917
• Q** A company A that produces transistors finds the following agreement with company B that produces TVs: A given shipment will be accepted by B if less than 10 percent of the transistors are broken. To test this B takes out and tests a control sample of 10 transistors from every shipment. Assuming that 5% of the transistors are broken. what is the probability that company B rejects a shipment based on the control sample.
• A** The shipment is accepted if <10%, or none of them is broken. So the probability of that is (1-0.05)10 (ie, prob(all ten are not broken)). Then the probability of reject = 1 - (0.95)10.
• Q** A couple is trying to have a child. Their probability of conception in any given monthly cycle is 1/8.

a.) what is the mean number of cycles until conception occurs? b.) what is the smallest n such that the probability that it takes them more than n cycles is less than 1/2?

• A** A. The mean is just 1/(1/8) = 8.

B. The probability that it occurs in the first month is 1/8. The prob in the second month is 7/8*1/8 (because prob that it occurs in the second month means it had not occurred in the first month, which has a prob of 7/8). By the same token, P(3rd month) = (7/8)^(3-1)*(1/8), and P(n-th month) = (7/8)n-1*(1/8). Having calculated these probability, we know that the probability that it takes them more than n month is P(x > n month) = 1 - P(x <= n months) = 1 - {P(1st month) + P(2nd month) + P(3rd month) + ... + P(nth month)} = 1 - {1/8 + 7/8*1/8 + (7/8)^2*1/8 + ... + (7/8)n-1)*1/8} (if this is confusing, think about n = 3, and only look at the first 3 terms). Note that the expression inside the {.} is a geometric series with ratio 7/8 when a common factor of 1/8 is taken out. In other words 1/8 + 7/8*1/8 + (7/8)2.*1/8 + ... + (7/8)^(n-1)*1/8 = 1/8 [1 - (7/8)n]/(1 - 7/8) = 1 - (7/8)n. As we need P(x > n month) < 1/2, so we are solving 1 - [1 - (7/8)n] < 1/2, or n > ln(1/2)/ln(7/8). (note that ln(7/8) is negative, so it reverses the inequality).

• Q** A machine starts operating at t = 0 (in years) and it fails with probability p( t) = Cte^{−2t}

a) Find the normalization constant C. b) Find the mean time to failure.

• A** A. With the pdf given, we know the cdf = 1 at t --> ∞, so we integrate the pdf and equate it to 1.

int_0^\infty p(t) dt = int_0^\infty Cte^{-2t} dt = 1. With integration by parts, this is equal to -Ct/2 e^{-2t} |_0^infty + C/2 int_0^infty e^{-2t} dt = -C/4 e^{-2t} |_0^infty = C/4. Solving C/4 = 1 gives C = 4.

B, The mean is given by int_0^infty tp(t) dt. Again, we use integration by parts (with C = 4). int_0^infty Ct2. e^{-2t} dt = -Ct2/2 e^{-2t} + C*int_0^infty te^{-2t}. The first term evaluates to 0 and the second term is the same as the integral in part A, meaning that it evaluates to 1. The answer is 1.

• Q** You are the boss of a factory that produces tortilla chips. The chips are all perfect circles but their radii vary. Over the summer you hire an undergraduate student who measures the chips’ radii and comes to the conclusion that the probability density function for the radius of a chip is p_r(r) = Cr2 with 0 <= r <= 4.

(a) Find the normalization constant C. (b) What is the mean radius of the chips? (c) What is the probability density function p_A(A) for the area of a chip? (d) What is the mean area of the chips? Compute the median of the area of the chips. (e) Assume that the chips are all 1mm thick and the mass density of them is rho = 1/2 g/cm3. What is the probability density function for the mass of a chip? (f) How thick would the chips have to be to raise the average mass of the chips to ¯m = 14.4 g?

• A** (a) To integrate to a cdf, we find int_0^4 p_r(r) dr = C/3 r^3 |_0^4 = 1. Solving to get C = 3/64.

(b) The mean radius is int_0^4 r*p(r) dr = int_0^4 Cr3 dr = C/4 r4 |_0^4 = 3 (c) From the pdf given, we can find the cdf for radius r, int_0^R p(r) dr = CR3/3. This is interpreted as P(r < R) = CR^3/3. To find the area, we have P(a < A) = P(a < pi*R^2) = P(sqrt(a/pi) < R) = C/3 (a/pi)3/2.. We obtain the last expression by plugging r = sqrt(a/pi) into P(r < R). Differentiating P(a < A) to get the pdf, which is p(a) = C/(2pi) (a/pi)1/2.

Another way of thinking about this question is that. Let f(r) and F(r) be the pdf and cdf in radius respectively, and dF/dr = f(r). Define F(a) and f(a) similarly. Note that f(a) = dF/da = dF/dr*dr/da = f(r)*(dr/da). As a = pi*r2., and r = sqrt(a/pi), dr/da = (1/2) (1/api)1/2.. Doing some algebra will give f(a) = Cr2 * (1/2) (1/api)^(1/2) = C/(2pi)* (a/pi)^(1/2)

(d) Note that r is an element of [0, 4], so area a has range [0, pi*r^2] = [0, 16pi]. Mean is given by int_0^16pi a*p(a) da = int_0^16pi C/2 (a/pi)^(3/2) da = C/5 (1/pi)3/2 a5/2 |_0^16pi = 48pi/5 . The median m is int_0^m p(a) da = int_0^m C/(2pi) (a/pi)1/2. da = 0.5. Integrating to get C/3 (m/pi)3/2 = 0.5, or m = pi*(32)^(2/3). (e) As mass = density * thickness * area, we know that mass = rho*t*pi*r^2, where t = thickness. P(m < M) = P(m < rho*t*pi*R2.) = P(sqrt(m/(rho*t*pi)) < R) = C/3 (m/(rho*t*pi))^(3/2). Differentiating to get p(m) = C/(2rho*t*pi) [m/(rho*t*pi)]1/2.. (f) Again, note that as r is in the range [0, 4], the mass is in the range [0, 16rho*t*pi]. Average mass is thus int_0^(16rho*t*pi) C/2 [m/(rho*t*pi)]3/2. dm = C/5 (1/rho*t*pi)^(3/2) m5/2. |_0^(16rho*t*pi) = 48(pi*rho*t)/5. Set it equal to 14.4 and plug in rho = 1/2, we have t = 3/pi.