Python two sample t-test confidence interval
Not sure about Scripy. Maybe there's a Scripy help site that will show the code. [Perhaps this.] Show
In R, a 95% CI is part of
Because the 95% CI includes $0$ the 2-sided test does not reject $H_0: \mu_1=\mu_2$ at the 5% level. The 95% margin of error is $t^*\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}},$ where $t^*$ cuts probability $0.025=2.5\%$ from the upper tail of Student's t distribution with degrees of freedom $\nu^\prime$ as found from the Welch formula involving sample variances and sample sizes. [Here, $\nu^\prime = 17.9,$ in some software rounded down to an integer. One always has $\min(n_1-1,n_2-1) \le \nu^\prime \le n_1+n_2-2.]$
It's always a good idea to keep the actual formulas in mind, even if one hopes to use them only rarely. I am looking for a quick way to get the t-test confidence interval in Python for the difference between means. Similar to this in R:
Out:
Next:
I am not really finding anything similar in either statsmodels or scipy, which is strange, considering the importance of significance intervals in hypothesis testing (and how much criticism the practice of reporting only the p-values recently got). In this Python data analysis tutorial, you will learn how to perform a two-sample t-test with Python. First, you will learn about the t-test including the assumptions of the statistical test. Following this, you will learn how to check whether your data follow the assumptions. After this, you will learn how to perform an two sample t-test using the following Python packages:
Finally, you will also learn how to interpret the results and, then, how to report the results (including data visualization).
PrerequisitesObviously, before learning how to calculate an independent t-test in Python, you will have at least one of the packages installed. Make sure that you have the following Python packages installed:
ScipyScipy is an essential package for data analysis in Python and is, in fact, a dependency of all of the other packages used in this tutorial. In this post, we will use it to test one of the assumptions using the shapiro-wilks test. Thus, you will need Scipy even though you use one of the other packages to calculate the t-test. Now, you might wonder why you should bother using any of the other packages for your analysis. Well, the ttest_ind function will return the t- and p-value whereas (some) of the other packages will return more values (e.g., the degrees of freedom, confidence interval, effect sizes) as well. PandasPandas will be used to import data into a dataframe and to calculate summary statistics. Thus, you will need this package to follow this tutorial. SeabornIf you want to visualize the different means and learn how to plot the p-values and effect sizes Seaborn is a very easy data visualization package. PingouinThis is the second package used, in this tutorial, to calculate the t-test. One neat thing with the ttest function, of the Pingouin package, is that it returns a lot of information we need when reporting the results from the statistical analysis. For example, using Pingouin we also get the degrees of freedom, Bayes Factor, power, effect size (Cohen’s d), and confidence interval. StatsmodelsStatsmodels is the third, and last package, used to carry out the independent samples t-test. You do not have to use and, thus, this package is not required for the post. It does, however, contrary to Scipy, also return the degrees of freedom in addition to the t- and p-values. Installing the Needed Python PackagesNow, if you don’t have the required packages they can be installed using either pip or conda (if you are using Anaconda). Here’s how to install Python packages with pip:
If pip is telling you that there is a newer version, you can learn how to upgrade pip. Using Pip to Install all Packages If you are using Anaconda here’s how to create a virtual environment and install the needed packages:
Obviously, you don’t have to install all the prerequisites of this post and you can refer to the post about installing Python packages if you need more information about the installation process. Another option is to check the YouTube video explaining how to install statsmodels in a virtual environment. Note, if needed you can use pip to install a specific version of a package, as well. Two Sample T-testThe two sample t-test is also known as the independent samples, independent, and unpaired t-test. Moreover, this type of statistical test compares two averages (means) and will give you information if these two means are statistically different from each other. The t-test also tells you whether the differences are statistically significant. In other words it lets you know if those differences could have happened by chance. Example: clinical psychologists may want to test a treatment for depression to find out if the treatment will change the quality of life. In an experiment, a control group (e.g., a group who are given a placebo, or “sugar pill”, or in this case no treatment) is always used. The control group may report that their average quality of life is 3, while the group getting the new treatment might report a quality of life of 5. It would seem that the new treatment might work. However, it could be due to a fluke. In order to test this, the clinical researchers can use the two-sample t-test. HypothesesNow, when performing t-tests you typically have the following two hypotheses:
Now, sometimes we also may have a specific idea about the direction of the condition. That is, we may, based on theory, assume that the condition one group is exposed to will lead to better performance (or worse). In these cases, the alternative hypothesis will be something like: the mean of one group either greater or lesser than another group (one-tailed). AssumptionsBofre we go on and import data so that we can practice carrying out t-test in Python we’ll briefly have a look at the assumptions of this parametric test. Now, besides that the dependent variables are interval/ratio, and are continuous, there are three assumptions that need to be met.
Note, do not worry if your data don’t follow the 3 assumptions above. For example, it is possible to carry out the Mann-Whitney U test in Python if your data is not normally distributed. Another option is to transform your dependent variable using square root, log, or Box-Cox in Python. Example DataFirst, before going on to the two-sample t-test in Python examples, we need some data to work with. In this blog post, we are going to work with data that can be found here. Furthermore, here we will import data from an Excel (.xls) file directly from the URL. Importing Data from CSV
In the code chunk above, we first imported pandas as pd. Second, we created a string with the URL to the .csv file. In the fourth row, we used Pandas read_csv to load the .csv file into a dataframe. Finally, we used the .head() method to print the first five rows: Example Dataframe As can be seen in the image above, we have two columns (grouping and height). Luckily, the column names are eas to work with when we, later, are going to subset the data. If we, on the other hand, had long column names, renaming columns in the Pandas dataframe would be wise. Subsetting the DataFinally, before calculating some descriptive statistics, we will subset the data. In the code below, we use the query method to create two Pandas series objects:
In the code chunk above, we first subset the rows containing men, in the column grouping. Subsequently, we do the exact same thing for the rows containing women. Note, that we are also selecting only the column named ‘height’ (i.e., the string within the brackets). Now, using the brackets and the column name as a string is one way to select columns in Pandas dataframe. Finally, if you don’t know the variable names, see the post “How to Get the Column Names from a Pandas Dataframe – Print and List“, for more information on how to get this information. Descriptive StatisticsNow, we are going to use the groupby method together with the describe method to calculate summary statistics. Note, that here we use the complete dataframe:
As we are interested in the difference between ‘A’ and ‘B’, in the dataset, we used ‘grouping’ as input to the groupby method. If you are interested in learning more about grouping data and calculating descriptive statistics in Python, see the following two posts:
Here’s a quick note: if you are working with NumPy you can convert an array to integer. In the next section, you will finally learn how to carry out a two-sample t-test with Python. Note, if you by now know that your groups are not independent (i.e., they are the same individuals measured other two different conditions) you can instead use Python to do a paired sample t-test. How to Check the Assumptions of the Two-Sample T-test in PythonIn this section, we will cover how to check the assumptions of the independent samples t-test. Of course, we are only going to check assumption 2 and 3. That is, we will start by checking whether the data from the two groups are following a normal distribution (assumption 2). Second, we will check whether the two populations have the same variance. </p> <h3><span class="ez-toc-section" id="Checking_the_Normality_of_Data"></span>Checking the Normality of Data<span class="ez-toc-section-end"></span></h3> <p>There are several methods to check whether our data is normally distributed. Here, we will use the Shapiro-Wilks test. Here’s how to examine if the data follow the normal distribution in Python:</p> <pre class="wp-block-code" aria-describedby="shcb-language-6" data-shcb-language-name="Python" data-shcb-language-slug="python"><div><code class="hljs language-python">stats.shapiro(male) <span class="hljs-comment"># Output: (0.9550848603248596, 0.7756242156028748)</span> stats.shapiro(female) <span class="hljs-comment"># Output: (0.9197608828544617, 0.467536598443985)</span></code></div><small class="shcb-language" id="shcb-language-6"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">Python</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">python</span><span class="shcb-language__paren">)</span></small></pre> <p>In the code chunk above, we performed the Shapiro-Wliks test on both Pandas series (i.e., for each group seperately). Consequently, we get a tuple, for each time we use the shapiro method. This tuple contains the test statistics and the p-value. Here, the null hypothesis is that the data follows a normal distribution. Thus, we can infer that the data from both groups is normally distributed. </p> <p><span id="ezoic-pub-ad-placeholder-164" class="ezoic-adpicker-ad"></span><span class="ezoic-ad ezoic-at-0 leader-2 leader-2164 adtester-container adtester-container-164" data-ez-name="marsja_se-leader-2"><span id="div-gpt-ad-marsja_se-leader-2-0" ezaw="300" ezah="250" style="position:relative;z-index:0;display:inline-block;padding:0;min-height:250px;min-width:300px" class="ezoic-ad"><script data-ezscrex="false" data-cfasync="false" type="text/javascript" style="display:none">if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[300,250],'marsja_se-leader-2','ezslot_11',164,'0','0'])};if(typeof __ez_fad_position != 'undefined'){__ez_fad_position('div-gpt-ad-marsja_se-leader-2-0')};Now, there are of course other tests, see this excellent overview, for information. Finally, it is also worth noting that most statistical tests for normality is sensitive for large samples. Normality can also be explored visually using histograms, q-q plots, to name a few. See the post How to Plot a Histogram with Pandas in 3 Simple Steps. Checking the Homogeneity of Variances Assumption in PythonRemember, before carrying out a t-test in Python, we also need to make sure that the variances in the two groups are equal. Here we’ll use Levene’s test to test for homogeneity of variances (equal variances) and this can be performed with the function levene as follow: Again, the p-value suggests that the data follows the assumption of equal variances. See this article for more information. Here are some options to Levene’s test of homogeneity:
It is worth noting here, that if our data does not fulfill the assumption of equal variances, we can use Welch’s t-test instead of Student’s t-test. See the references at the end of the post. Luckily, both Levene’s test and Bartlett’s test can be carried out in Python with SciPy (e.g. see above). How to Carry Out a Two-Sample T-test in Python in 3 WaysIn this section, we are going to learn how to perform an independent samples t-test with Python. To be more exact, we will cover three methods: using SciPy, Pingouin, and Statsmodels. First, we will use SciPy: 1) T-test with SciPy
In the code chunk above, we imported the ttest_ind method to carry out our data analysis. All the three methods, described in this post, requires that you already have imported Pandas and used it to load your dataset. How to Interpret the Results from a T-testIn this section, you are briefly going to learn how to interpret the results from the two-sample t-test carried out with Python. Furthermore, this section will focus on the results from Pingouin and Statsmodels as they give us a more rich output (e.g., degrees of freedom, effect size). Finally, following this section, you will further learn how to report the t-test according to the guidelines of the American Psychological Association. Interpreting the P-valueNow, the p-value of the test is 0.017106, which is less than the significance level alpha (e.g., 0.05). Furthermore, this means that we can conclude that the men’s average height is statistically different from the female’s average height. </p> <p>Specifically, a p-value is a probability of obtaining an effect at least as extreme as the one in the data you have obtained (i.e., your sample), assuming that the null hypothesis is true. Moreover, p-values address only one question which is concerned about how likely your collected data is, assuming a true null hypothesis? Importantly, it cannot be used as support for the alternative hypothesis.</p> <h3><span class="ez-toc-section" id="Interpreting_the_Effect_Size_Cohens_D"></span>Interpreting the Effect Size (Cohen’s D)<span class="ez-toc-section-end"></span></h3> <p>One common way to interpret Cohen’s D that is obtained in a t-test is in terms of the relative strength of e.g. the condition. Cohen (1988) suggested that <em>d</em>=0.2 should be considered a ‘small’ effect size, 0.5 is a ‘medium’ effect size, and that 0.8 is a ‘large’ effect size. This means that if two groups’ means don’t differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically significant. </p> <h3><span class="ez-toc-section" id="Interpreting_the_Bayes_Factor_from_Pingouin"></span>Interpreting the Bayes Factor from Pingouin<span class="ez-toc-section-end"></span></h3> <p>Now, if you used Pingouin to carry out the two-sample t-test you might have noticed that we also get the Bayes Factor. See <a href="https://www.statisticshowto.com/bayes-factor-definition/" target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener">this post</a> for more information.</p> <div class="wp-block-image"><figure class="aligncenter size-full"><div class="ss-on-media-container wp-image-6630"><span class="ss-on-media-image-wrap wp-image-6630"><img width="1022" height="154" src="https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg" alt="Results from a two-samples t-test Python" class="ss-on-media-img wp-image-6630" srcset="https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg 1022w, https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python-980x148.jpg 980w, https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python-480x72.jpg 480w" sizes="(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) 1022px, 100vw"> <div class="ss-on-media-wrapper ss-top-left-on-media ss-small-icons ss-hide-on-mobile ss-with-spacing ss-circle-icons"> <ul class="ss-social-icons-container ss-on-media-pinit"> <li> <div data-ss-ss-link="https://pinterest.com/pin/create/button/?url=https%3A%2F%2Fwww.marsja.se%2Fhow-to-perform-a-two-sample-t-test-with-python-3-different-methods%2F&media=https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg&description=via%20%40marsja" class="ss-pinterest-color ss-pinit-button ss-ss-on-media-button" data-ss-ss-location="on_media" data-ss-ss-network-id="pinterest" data-ss-ss-type="share"> <span class="ss-on-media-content"> <svg class="ss-svg-icon" aria-hidden="true" role="img" focusable="false" width="32" height="32" viewbox="0 0 32 32" xmlns="http://www.w3.org/2000/svg"><path d="M10.625 12.25c0-1.375 0.313-2.5 1.063-3.438 0.688-0.938 1.563-1.438 2.563-1.438 0.813 0 1.438 0.25 1.875 0.813s0.688 1.25 0.688 2.063c0 0.5-0.125 1.125-0.313 1.813-0.188 0.75-0.375 1.625-0.688 2.563-0.313 1-0.563 1.75-0.688 2.313-0.25 1-0.063 1.875 0.563 2.625 0.625 0.688 1.438 1.063 2.438 1.063 1.75 0 3.188-1 4.313-2.938 1.125-2 1.688-4.375 1.688-7.188 0-2.125-0.688-3.875-2.063-5.25-1.375-1.313-3.313-2-5.813-2-2.813 0-5.063 0.875-6.75 2.688-1.75 1.75-2.625 3.875-2.625 6.375 0 1.5 0.438 2.75 1.25 3.75 0.313 0.313 0.375 0.688 0.313 1.063-0.125 0.313-0.25 0.813-0.375 1.5-0.063 0.25-0.188 0.438-0.375 0.5s-0.375 0.063-0.563 0c-1.313-0.563-2.25-1.438-2.938-2.75s-1-2.813-1-4.5c0-1.125 0.188-2.188 0.563-3.313s0.875-2.188 1.625-3.188c0.75-1.063 1.688-1.938 2.688-2.75 1.063-0.813 2.313-1.438 3.875-1.938 1.5-0.438 3.125-0.688 4.813-0.688 1.813 0 3.438 0.313 4.938 0.938 1.5 0.563 2.813 1.375 3.813 2.375 1.063 1.063 1.813 2.188 2.438 3.5 0.563 1.313 0.875 2.688 0.875 4.063 0 3.75-0.938 6.875-2.875 9.313-1.938 2.5-4.375 3.688-7.375 3.688-1 0-1.938-0.188-2.813-0.688-0.875-0.438-1.5-1-1.875-1.688-0.688 2.938-1.125 4.688-1.313 5.25-0.375 1.438-1.25 3.188-2.688 5.25h-1.313c-0.25-2.563-0.188-4.688 0.188-6.375l2.438-10.313c-0.375-0.813-0.563-1.813-0.563-3.063z" /></svg>Save </span> </div> </li> </ul> </div> </span></div></figure></div> <h2><span class="ez-toc-section" id="Reporting_the_Results"></span>Reporting the Results<span class="ez-toc-section-end"></span></h2> <p><span id="ezoic-pub-ad-placeholder-168" class="ezoic-adpicker-ad"></span><span class="ezoic-ad ezoic-at-0 leader-4 leader-4168 adtester-container adtester-container-168" data-ez-name="marsja_se-leader-4"><span id="div-gpt-ad-marsja_se-leader-4-0" ezaw="336" ezah="280" style="position:relative;z-index:0;display:inline-block;padding:0;width:100%;max-width:1200px;margin-left:auto !important;margin-right:auto!important;min-height:280px;min-width:336px" class="ezoic-ad"><script data-ezscrex="false" data-cfasync="false" type="text/javascript" style="display:none">if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[336,280],'marsja_se-leader-4','ezslot_14',168,'0','0'])};if(typeof __ez_fad_position != 'undefined'){__ez_fad_position('div-gpt-ad-marsja_se-leader-4-0')};In this section, you will learn how to report the results according to the APA guidelines. In our case, we can report the results from the t-test like this: There was a significant difference in height for men (M = 179.87, SD = 6.21) and women (M = 171.05, SD = 5.69); t(12) = 2.77, p = .017, %95 CI [1.87, 15.76], d = 1.48. In the next section, you will also quickly learn how to visualize the data in two different ways: boxplots and violin plots. Visualize the Data using Boxplots:One way to visualize data from two groups is using the box plot:
|