Wednesday, May 1, 2013

Power Plots in statsmodels

I just want to show another two plots for the statistical power of a test, since I didn't have time for this earlier

The code to produce them is just calling the methods of the power classes, for example for the one sample t-test.

import numpy as np
import statsmodels.stats.power as smpwr
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(2,1,1)
fig=smpwr.TTestPower().plot_power(dep_var='nobs',
                                  nobs= np.arange(2, 200),
                                  effect_size=np.array([0.1, 0.2, 0.3, 0.5, 1, 1.5, 2]),
                                  alternative='larger',
                                  ax=ax, title='Power of t-Test')
ax = fig.add_subplot(2,1,2)
fig=smpwr.TTestPower().plot_power(dep_var='es',
                                  nobs=np.array([10, 20, 30, 50, 70, 100]),
                                  effect_size=np.linspace(0.01, 2, 51),
                                  alternative='larger',
                                  ax=ax, title='') #supress title

(This won't run with statsmodels until I merge my branch)

dep_var is the variable on the x-axis of the plot, which currently can be either the number of observations, the effect size, or the significance level. The different lines in the plot are assumed to be either the effect size or the number of observations. The very useful pattern that we follow in statsmodels is that we can either attach a plot to a given axis ax, or if it is None, then it creates a figure. This provides a flexibility to either use the pre-fabricated plot, which in this case doesn't have many option yet, or to embed the axis into another plot. Manipulating display elements with matplotlib after the plot is created, is also relatively easy.

I was struggling a bit with the colors for the lines in the plot, and then just went with matplotlib's Dark2 colormap. The standard color cycling of matplotlib does not have enough colors for this case.

The main part of the plotting code for the plot with the number of observations on the x-axis, is just

        colormap = plt.cm.Dark2
        plt_alpha = 1 #0.75
        lw = 2
        if dep_var == 'nobs':
            colors = [colormap(i) for i in np.linspace(0, 0.9, len(effect_size))]
            for ii, es in enumerate(effect_size):
                power = self.power(es, nobs, alpha, **kwds)
                ax.plot(nobs, power, lw=lw, alpha=plt_alpha, 
                        color=colors[ii], label='es=%4.2F' % es)
                xlabel = 'Number of Observations'

For similar plots see this stackoverflow question or this blog for plots in G-Power 3.

I have not tried any fine tuning or fancy options for the plot style yet.

The first plot is a one sample t-test, the second is the one sample normal test, i.e. z-test. The two plots look almost the same since I wanted to see whether there is a visible difference between tests based on t and normal distribution. Each of the top graphs shows the power (1 - type II error) as a function of the number of observations, the bottom plots show the power as a function of the effect size.

One sample t-Test

One sample z-Test

No comments:

Post a Comment