Hank Anderson
hank@statease.com
https://github.com/hpanderson/dexpy-pymntos
dexpy
to improve the office coffeestatsmodels
seaborn
and matplotlib
to find the best pot of coffeeA systematic series of tests, in which purposeful changes are made to input factors, so that you may identify causes for significant changes in the output responses.
NIST has a nice primer on DOE.
dot = Digraph(comment='Design of Experiments')
dot.body.extend(['rankdir=LR', 'size="10,10"'])
dot.node_attr.update(shape='rectangle', style='filled', fontsize='20', fontname="helvetica")
dot.node('X', 'Controllable Factors', color='mediumseagreen', width='3')
dot.node('Z', 'Noise Factors', color='indianred2', width='3')
dot.node('P', 'Process', color='lightblue', height='1.25', width='3')
dot.node('Y', 'Responses', color='lightblue')
dot.edges(['XP', 'ZP', 'PY'] * 3)
dot
numpy
, scipy
, pandas
and patsy
statsmodels
for analysis, matplotlib
and seaborn
for visualization ((((
((((
))))
_ .---.
( |`---'|
\| |
: `---' :
`-----'
df = dexpy.factorial.build_factorial(2, 4)
df.columns = ['amount', 'grind_size']
fg = sns.lmplot('amount', 'grind_size', data=df, fit_reg=False)
ax = fg.axes[0, 0]
ax.set_xticks([-1, 1])
ax.set_xticklabels(get_tick_labels('amount', actual_lows, actual_highs, units))
ax.set_yticks([-1, 1])
ax.set_yticklabels(get_tick_labels('grind_size', actual_lows, actual_highs, units))
p = ax.add_patch(patches.Rectangle((-1, -1), 2, 2, color="navy", alpha=0.3, lw=2))
cube_design = dexpy.factorial.build_factorial(3, 8)
points = np.array(cube_design)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d', axisbg='w')
ax.view_init(30, -60) # rotate plot
X, Y = np.meshgrid([-1,1], [-1,1])
cube_alpha = 0.1
ax.plot_surface(X, Y, 1, alpha=cube_alpha)
ax.plot_surface(X, Y, -1, alpha=cube_alpha)
ax.plot_surface(X, -1, Y, alpha=cube_alpha)
ax.plot_surface(X, 1, Y, alpha=cube_alpha)
ax.plot_surface(1, X, Y, alpha=cube_alpha)
ax.plot_surface(-1, X, Y, alpha=cube_alpha)
ax.scatter3D(points[:, 0], points[:, 1], points[:, 2], c="b")
ax.set_xticks([-1, 1])
ax.set_xticklabels(["8mm", "10mm"])
ax.set_yticks([-1, 1])
ax.set_yticklabels(["2.5oz", "4oz"])
ax.set_zticks([-1, 1])
ax.set_zticklabels(["light", "dark"])
plt.show()
The probability that a design will detect an active effect.
Effect? | Retain H0 | Reject H0 |
---|---|---|
No | OK | Type 1 Error |
Yes | Type II Error | OK |
Power is expressed as a probability to detect an effect of size Δ, given noise σ. This is typically given as a delta to sigma ratio Δ/σ. Power is a function of the signal to noise ratio, as well as the number and layout of experiments in the design.
runs = 50
delta = 0.5
sigma = 1.5
alpha = 0.05
one_factor = pd.DataFrame([ -1, 1 ] * runs, columns=['beans'])
one_factor_power = dexpy.power.f_power('beans', one_factor, delta/sigma, alpha)
display(Markdown('''
# Power Example
{} pots of coffee are tested with light beans, then {} pots with dark beans.
There is a variance of {} taste rating from pot to pot. If we expect a {} change
in the taste rating when going from light to dark, what is the likelihood we would detect it?
(Answer: **{:.2f}%**)
Note: this assumes that we reject H<sub>0</sub> at p <= {}
'''.format(int(runs / 2), int(runs / 2), sigma, delta, one_factor_power[1]*100, alpha)
))
def plot_shift(runs, delta, sigma, annotate=False):
"""Plots two sets of random normal data, one shifted up delta units."""
mean = 5
low = sigma*np.random.randn(int(runs/2),1)+mean
high = sigma*np.random.randn(int(runs/2),1)+mean+delta
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_ylabel("taste")
ax.set_xlabel("runs")
ax.set_ylim([0, 11])
plt.plot(np.concatenate([low, high]))
plt.plot([0, (runs/2)], [low.mean()] * 2, color='firebrick', lw=2)
plt.plot([(runs/2), runs-1], [high.mean()] * 2, color='g', lw=2)
p_value = scipy.stats.f_oneway(low, high).pvalue[0]
if annotate:
plt.annotate("p: {:.5f}".format(p_value),
xy=(runs / 2, (low.mean() + high.mean()) / 2), xycoords='data',
xytext=(.8, .9), textcoords='axes fraction',
bbox=dict(boxstyle="round4", fc="w"),
arrowprops=dict(arrowstyle='-[', linewidth=2, color='black', connectionstyle="angle3")
)
plt.show()
return [low, high]
low, high = plot_shift(runs, delta, sigma)
25 pots of coffee are tested with light beans, then 25 pots with dark beans. There is a variance of 1.5 taste rating from pot to pot. If we expect a 0.5 change in the taste rating when going from light to dark, what is the likelihood we would detect it? (Answer: 37.86%)
Note: this assumes that we reject H0 at p <= 0.05
increased_delta = delta*4
increased_delta_power = dexpy.power.f_power('beans', one_factor, increased_delta/sigma, alpha)
display(Markdown('''
# Power Example - Increase Delta
What if we don't care about a taste increase of 0.5? That's not that much better
than the current coffee, after all. Instead, if we say we only care about a change
in rating above {}, what is the likelihood we would detect such a change?
(Answer: **{:.5f}%**)
'''.format(increased_delta, increased_delta_power[1]*100)
))
_ = plot_shift(runs, increased_delta, sigma)
What if we don't care about a taste increase of 0.5? That's not that much better than the current coffee, after all. Instead, if we say we only care about a change in rating above 2.0, what is the likelihood we would detect such a change? (Answer: 99.99983%)
decreased_sigma = sigma*0.5
decreased_sigma_power = dexpy.power.f_power('beans', one_factor, delta/decreased_sigma, alpha)
display(Markdown('''
# Power Example - Decrease Noise
Instead of lowering our standards for our noisy taste ratings, instead
we could bring in expert testers who have a much more accurate palate.
If we assume a decrease in noise from {} to {}, then we can detect a
change in rating of {} with {:.2f}% probability.
'''.format(sigma, decreased_sigma, delta, decreased_sigma_power[1]*100)
))
_ = plot_shift(runs, delta, sigma*0.1)
Instead of lowering our standards for our noisy taste ratings, instead we could bring in expert testers who have a much more accurate palate. If we assume a decrease in noise from 1.5 to 0.75, then we can detect a change in rating of 0.5 with 91.00% probability.
increased_runs = runs * 4
one_factor = pd.DataFrame([ -1, 1 ] * increased_runs, columns=['beans'])
increased_runs_power = dexpy.power.f_power('beans', one_factor, delta/sigma, alpha)
display(Markdown('''
# Power Example - Increase Runs
If expert testers are too expensive, and we are unwilling to compromise
our standards, then the only remaining option is to create a more powerful
design. In this toy example, there isn't much we can do, since it's
just one factor. Increasing the runs from {} to {} gives a power of
{:.2f}%. This may be a more acceptable success rate than the original power
of {:.2f}%, however... that is a lot of coffee to drink.
For more complicated designs changing the structure of the design
can also increase power.
'''.format(runs, increased_runs, increased_runs_power[1]*100, one_factor_power[1]*100)
))
_ = plot_shift(increased_runs, delta, sigma)
If expert testers are too expensive, and we are unwilling to compromise our standards, then the only remaining option is to create a more powerful design. In this toy example, there isn't much we can do, since it's just one factor. Increasing the runs from 50 to 200 gives a power of 91.39%. This may be a more acceptable success rate than the original power of 37.86%, however... that is a lot of coffee to drink.
For more complicated designs changing the structure of the design can also increase power.
https://statease.github.io/dexpy/evaluation.html#statistical-power
help(dexpy.power.f_power)
Help on function f_power in module dexpy.power: f_power(model, design, effect_size, alpha) Calculates the power of an F test. This calculates the probability that the F-statistic is above its critical value (alpha) given an effect of some size. :param model: A patsy formula for which to calculate power. :type model: patsy.formula :param design: A pandas.DataFrame representing a design. :type design: pandas.DataFrame :param effect_size: The size of the effect that the test should be able to detect (also called a signal to noise ratio). :type effect_size: float :param alpha: The critical value that we want the test to be above. :type alpha: float between 0 and 1 :returns: A list of percentage probabilities that an F-test could detect an effect of the given size at the given alpha value for a particular column. Usage: >>> design = dexpy.factorial.build_factorial(4, 8) >>> print(dexpy.power.f_power("1 + A + B + C + D", design, 2.0, 0.05)) [ 95.016, 49.003, 49.003, 49.003, 49.003 ]
full_design = dexpy.factorial.build_factorial(5, 2**5)
full_design.columns = ['amount', 'grind_size', 'brew_time', 'grind_type', 'beans']
factorial_power = dexpy.power.f_power(model, full_design, sn, alpha)
factorial_power.pop(0)
factorial_power = ['{0:.2f}%'.format(i*100) for i in factorial_power] # convert to %
factorial_power = pd.DataFrame(factorial_power, columns=['Power'], index=full_design.columns)
display(Markdown('''
# Calculating Power with dexpy: Factorial
* {} total runs, with a signal to noise ratio of 2.
* Model (`patsy` for: `{}`
'''.format(len(full_design), model)))
display(PrettyPandas(factorial_power))
patsy
for: amount + grind_size + brew_time + grind_type + beans
Power | |
---|---|
amount | 99.97% |
grind_size | 99.97% |
brew_time | 99.97% |
grind_type | 99.97% |
beans | 99.97% |
https://statease.github.io/dexpy/design-build.html#module-dexpy.factorial
help(dexpy.factorial.build_factorial)
Help on function build_factorial in module dexpy.factorial: build_factorial(factor_count, run_count) Builds a regular two-level design based on a number of factors and runs. Full two-level factorial designs may be run for up to 9 factors. These designs permit estimation of all main effects and all interaction effects. If the number of runs requested is a 2^factor_count, the design will be a full factorial. If the number of runs is less than 2^factor_count (it still must be a power of two) a fractional design will be created. Not all combinations of runs and factor counts will result in a design. Use the :ref:`alias list<alias-list>` method to see what terms are estimable in the resulting design. :param factor_count: The number of factors to build for. :type factor_count: int :param run_count: The number of runs in the resulting design. Must be a power of 2. :type run_count: int :returns: A pandas.DataFrame object containing the requested design.
coffee_design = dexpy.factorial.build_factorial(5, 2**(5-1))
coffee_design.columns = ['amount', 'grind_size', 'brew_time', 'grind_type', 'beans']
center_points = [
[0, 0, 0, -1, -1],
[0, 0, 0, -1, 1],
[0, 0, 0, 1, -1],
[0, 0, 0, 1, 1]
]
coffee_design = coffee_design.append(pd.DataFrame(center_points * 2, columns=coffee_design.columns))
coffee_design.index = np.arange(0, len(coffee_design))
actual_design = coded_to_actual(coffee_design, actual_lows, actual_highs)
display(Markdown("## 2<sup>(5-1)</sup> Factorial Design"))
display(PrettyPandas(actual_design))
amount | grind_size | brew_time | grind_type | beans | |
---|---|---|---|---|---|
0 | 2.5 | 8 | 3.5 | burr | dark |
1 | 2.5 | 8 | 3.5 | blade | light |
2 | 2.5 | 8 | 4.5 | burr | light |
3 | 2.5 | 8 | 4.5 | blade | dark |
4 | 2.5 | 10 | 3.5 | burr | light |
5 | 2.5 | 10 | 3.5 | blade | dark |
6 | 2.5 | 10 | 4.5 | burr | dark |
7 | 2.5 | 10 | 4.5 | blade | light |
8 | 4 | 8 | 3.5 | burr | light |
9 | 4 | 8 | 3.5 | blade | dark |
10 | 4 | 8 | 4.5 | burr | dark |
11 | 4 | 8 | 4.5 | blade | light |
12 | 4 | 10 | 3.5 | burr | dark |
13 | 4 | 10 | 3.5 | blade | light |
14 | 4 | 10 | 4.5 | burr | light |
15 | 4 | 10 | 4.5 | blade | dark |
16 | 3.25 | 9 | 4 | burr | light |
17 | 3.25 | 9 | 4 | burr | dark |
18 | 3.25 | 9 | 4 | blade | light |
19 | 3.25 | 9 | 4 | blade | dark |
20 | 3.25 | 9 | 4 | burr | light |
21 | 3.25 | 9 | 4 | burr | dark |
22 | 3.25 | 9 | 4 | blade | light |
23 | 3.25 | 9 | 4 | blade | dark |
model = ' + '.join(coffee_design.columns)
factorial_power = dexpy.power.f_power(model, coffee_design, sn, alpha)
factorial_power.pop(0)
factorial_power = ['{0:.2f}%'.format(i*100) for i in factorial_power] # convert to %
factorial_power = pd.DataFrame(factorial_power, columns=['Power'], index=coffee_design.columns)
display(Markdown('''
## 2<sup>(5-1)</sup> Factorial Power
* Power for {} total runs
* Signal to noise ratio of 2
* Model: `{}`
'''.format(len(coffee_design), model)))
display(PrettyPandas(factorial_power))
amount + grind_size + brew_time + grind_type + beans
Power | |
---|---|
amount | 96.55% |
grind_size | 96.55% |
brew_time | 96.55% |
grind_type | 99.61% |
beans | 99.61% |
statsmodels
typically takes numpy
arrays for X and y datapatsy
formulareduced_model = "amount + grind_size + brew_time + grind_type + beans + amount:beans + grind_size:brew_time + grind_size:grind_type"
lm = statsmodels.formula.api.ols("taste_rating ~" + reduced_model, data=coffee_design).fit()
print(lm.summary2())
Results: Ordinary least squares ===================================================================== Model: OLS Adj. R-squared: 0.746 Dependent Variable: taste_rating AIC: 79.5691 Date: 2016-11-10 19:52 BIC: 90.1715 No. Observations: 24 Log-Likelihood: -30.785 Df Model: 8 F-statistic: 9.438 Df Residuals: 15 Prob (F-statistic): 0.000123 R-squared: 0.834 Scale: 1.2184 --------------------------------------------------------------------- Coef. Std.Err. t P>|t| [0.025 0.975] --------------------------------------------------------------------- Intercept 5.0318 0.2253 22.3328 0.0000 4.5516 5.5121 amount 0.9731 0.2759 3.5266 0.0031 0.3850 1.5613 grind_size 0.0022 0.2759 0.0078 0.9939 -0.5860 0.5903 brew_time 1.2061 0.2759 4.3709 0.0005 0.6180 1.7943 grind_type -0.0974 0.2253 -0.4324 0.6716 -0.5777 0.3828 beans 0.5774 0.2253 2.5628 0.0216 0.0972 1.0577 amount:beans -1.4820 0.2759 -5.3707 0.0001 -2.0702 -0.8939 grind_size:brew_time 0.3961 0.2759 1.4354 0.1717 -0.1921 0.9843 grind_size:grind_type -0.6927 0.2759 -2.5103 0.0240 -1.2809 -0.1046 --------------------------------------------------------------------- Omnibus: 4.208 Durbin-Watson: 2.190 Prob(Omnibus): 0.122 Jarque-Bera (JB): 1.550 Skew: -0.116 Prob(JB): 0.461 Kurtosis: 1.777 Condition No.: 1 =====================================================================
display(Markdown('''
If we take the experiment data from the design and use our new model to fit that data, then plot it against
the observed values we can get an idea for how well our model predicts. Points above the 45 degree line are
overpredicting for that combination of inputs. Points below the line predict a lower taste rating than
we actually measured during the experiment.'''))
actual_predicted = pd.DataFrame({ 'actual': coffee_design['taste_rating'],
'predicted': lm.fittedvalues
}, index=np.arange(len(coffee_design['taste_rating'])))
fg = sns.FacetGrid(actual_predicted, size=5)
fg.map(plt.scatter, 'actual', 'predicted')
ax = fg.axes[0, 0]
ax.plot([1, 10], [1, 10], color='g', lw=2)
ax.set_xticks(np.arange(1, 11))
ax.set_xlim([0, 11])
ax.set_yticks(np.arange(1, 11))
ax.set_title('Actual vs Predicted')
_ = ax.set_ylim([0, 11])
If we take the experiment data from the design and use our new model to fit that data, then plot it against the observed values we can get an idea for how well our model predicts. Points above the 45 degree line are overpredicting for that combination of inputs. Points below the line predict a lower taste rating than we actually measured during the experiment.
display(Markdown('''
Plotting the prediction for two factors at once shows how they interact with each other.
In this graph you can see that at the low brew time the larger grind size results in
a poor taste rating, likely because the coffee is too weak.'''))
pred_points = pd.DataFrame(1, columns = coffee_design.columns, index=np.arange(4))
pred_points.loc[1,'grind_size'] = -1
pred_points.loc[3,'grind_size'] = -1
pred_points.loc[2,'brew_time'] = -1
pred_points.loc[3,'brew_time'] = -1
pred_points['taste_rating'] = lm.predict(pred_points)
pred_points = coded_to_actual(pred_points, actual_lows, actual_highs)
fg = sns.factorplot('grind_size', 'taste_rating', hue='brew_time', kind='point', data=pred_points)
ax = fg.axes[0, 0]
ax.set_xticklabels(get_tick_labels('grind_size', actual_lows, actual_highs, units))
_ = ax.set_title('Grind Size/Brew Time Interaction')
Plotting the prediction for two factors at once shows how they interact with each other. In this graph you can see that at the low brew time the larger grind size results in a poor taste rating, likely because the coffee is too weak.
display(Markdown('''
This graph contains the prediction with the highest taste rating, 7.72.
However, if you look at the dark bean line there is a point where we can get
a rating of 6.93 with 2.5oz of grounds.
'''))
pred_points = pd.DataFrame(1, columns = coffee_design.columns, index=np.arange(4))
pred_points.loc[1,'amount'] = -1
pred_points.loc[3,'amount'] = -1
pred_points.loc[2,'beans'] = -1
pred_points.loc[3,'beans'] = -1
pred_points['taste_rating'] = lm.predict(pred_points)
pred_points = coded_to_actual(pred_points, actual_lows, actual_highs)
fg = sns.factorplot('amount', 'taste_rating', hue='beans', kind='point', palette={'dark': 'maroon', 'light': 'peru'}, data=pred_points)
ax = fg.axes[0, 0]
ax.set_xticklabels(get_tick_labels('amount', actual_lows, actual_highs, units))
ax.set_title('Amount/Beans Interaction')
plt.show()
display(PrettyPandas(pred_points))
display(Markdown('''That savings of 1.5oz per pot would create a nice surplus in the coffee budget at the end of the year.'''))
This graph contains the prediction with the highest taste rating, 7.72. However, if you look at the dark bean line there is a point where we can get a rating of 6.93 with 2.5oz of grounds.
amount | grind_size | brew_time | grind_type | beans | taste_rating | |
---|---|---|---|---|---|---|
0 | 4 | 10 | 4.5 | blade | dark | 5.91462 |
1 | 2.5 | 10 | 4.5 | blade | dark | 6.93237 |
2 | 4 | 10 | 4.5 | blade | light | 7.7238 |
3 | 2.5 | 10 | 4.5 | blade | light | 2.81345 |
That savings of 1.5oz per pot would create a nice surplus in the coffee budget at the end of the year.
dexpy
docs are at: https://statease.github.io/dexpy/dexpy
is only at version 0.1, we plan on greatly expanding the design and analysis capabilities