Python two sample t-test confidence interval

Not sure about Scripy. Maybe there's a Scripy help site that will show the code. [Perhaps this.]

In R, a 95% CI is part of t.test output, where the Welch version of the 2-sample t test is the default (and argument var.eq=T gets you the pooled test).

ts1 = c(11,9,10,11,10,12,9,11,12,9)
ts2 = c(11,13,10,13,12,9,11,12,12,11)
t.test(ts1, ts2)

        Welch Two Sample t-test

data:  ts1 and ts2
t = -1.8325, df = 17.9, p-value = 0.08356
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.1469104  0.1469104
sample estimates:
mean of x mean of y 
     10.4      11.4 

Because the 95% CI includes $0$ the 2-sided test does not reject $H_0: \mu_1=\mu_2$ at the 5% level.

The 95% margin of error is $t^*\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}},$ where $t^*$ cuts probability $0.025=2.5\%$ from the upper tail of Student's t distribution with degrees of freedom $\nu^\prime$ as found from the Welch formula involving sample variances and sample sizes. [Here, $\nu^\prime = 17.9,$ in some software rounded down to an integer. One always has $\min(n_1-1,n_2-1) \le \nu^\prime \le n_1+n_2-2.]$

me = qt(.975, 17.9)*sqrt(var(ts1)/10+var(ts2)/10); me
[1] 1.146912
pm=c(-1,1)
-1 + pm*me
[1] -2.1469118  0.1469118

It's always a good idea to keep the actual formulas in mind, even if one hopes to use them only rarely.

I am looking for a quick way to get the t-test confidence interval in Python for the difference between means. Similar to this in R:

X1 <- rnorm(n = 10, mean = 50, sd = 10)
X2 <- rnorm(n = 200, mean = 35, sd = 14)
# the scenario is similar to my data

t_res <- t.test(X1, X2, alternative = 'two.sided', var.equal = FALSE)    
t_res

Out:

    Welch Two Sample t-test

data:  X1 and X2
t = 1.6585, df = 10.036, p-value = 0.1281
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.539749 17.355816
sample estimates:
mean of x mean of y 
 43.20514  35.79711 

Next:

>> print(c(t_res$conf.int[1], t_res$conf.int[2]))
[1] -2.539749 17.355816

I am not really finding anything similar in either statsmodels or scipy, which is strange, considering the importance of significance intervals in hypothesis testing (and how much criticism the practice of reporting only the p-values recently got).

In this Python data analysis tutorial, you will learn how to perform a two-sample t-test with Python. First, you will learn about the t-test including the assumptions of the statistical test. Following this, you will learn how to check whether your data follow the assumptions. 

After this, you will learn how to perform an two sample t-test using the following Python packages:

  • Scipy (scipy.stats.ttest_ind)
  • Pingouin (pingouin.ttest)
  • Statsmodels (statsmodels.stats.weightstats.ttest_ind)
  • Interpret and report the two-sample t-test
    • Including effect sizes

Finally, you will also learn how to interpret the results and, then, how to report the results (including data visualization). 

  • Prerequisites
  • Installing the Needed Python Packages
  • Two Sample T-test
    • Hypotheses
    • Assumptions
  • Example Data
    • Importing Data from CSV
    • Subsetting the Data
  • Descriptive Statistics
  • How to Check the Assumptions of the Two-Sample T-test in Python
    • Checking the Normality of Data
    • Checking the Homogeneity of Variances Assumption in Python
  • How to Carry Out a Two-Sample T-test in Python in 3 Ways
    • 1) T-test with SciPy 
    • 2) Two-Sample T-Test with Pingouin
    • 3) T-test with Statsmodels
  • How to Interpret the Results from a T-test
    • Interpreting the P-value
    • Interpreting the Effect Size (Cohen’s D)
    • Interpreting the Bayes Factor from Pingouin
  • Reporting the Results
    • Visualize the Data using Boxplots:
    • Visualize the Data using Violin Plots:
  • Summary
  • Additional Resources and References

Prerequisites

Obviously, before learning how to calculate an independent t-test in Python, you will have at least one of the packages installed. Make sure that you have the following Python packages installed:

  • Scipy
  • Pandas
  • Seaborn
  • Pingouin (if using pingouin.ttest)
  • Statsmodels (if using statsmodels.stats.weightstats.ttest_ind)

Scipy

Scipy is an essential package for data analysis in Python and is, in fact, a dependency of all of the other packages used in this tutorial. In this post, we will use it to test one of the assumptions using the shapiro-wilks test. Thus, you will need Scipy even though you use one of the other packages to calculate the t-test. Now, you might wonder why you should bother using any of the other packages for your analysis. Well, the ttest_ind function will return the t- and p-value whereas (some) of the other packages will return more values (e.g., the degrees of freedom, confidence interval, effect sizes) as well. 

Pandas

Pandas will be used to import data into a dataframe and to calculate summary statistics. Thus, you will need this package to follow this tutorial.

Seaborn

If you want to visualize the different means and learn how to plot the p-values and effect sizes Seaborn is a very easy data visualization package.

Pingouin

This is the second package used, in this tutorial, to calculate the t-test. One neat thing with the ttest function, of the Pingouin package, is that it returns a lot of information we need when reporting the results from the statistical analysis. For example, using Pingouin we also get the degrees of freedom, Bayes Factor, power, effect size (Cohen’s d), and confidence interval.

Statsmodels

Statsmodels is the third, and last package, used to carry out the independent samples t-test. You do not have to use and, thus, this package is not required for the post. It does, however, contrary to Scipy, also return the degrees of freedom in addition to the t- and p-values.

Installing the Needed Python Packages

Now, if you don’t have the required packages they can be installed using either pip or conda (if you are using Anaconda). Here’s how to install Python packages with pip:

pip install scipy numpy seaborn pandas statsmodels pingouin

Code language: Bash (bash)

If pip is telling you that there is a newer version, you can learn how to upgrade pip.

Python two sample t-test confidence interval

Using Pip to Install all Packages

If you are using Anaconda here’s how to create a virtual environment and install the needed packages:

conda create -n 2sampttest conda activate 2sampttest conda install scipy numpy pandas seaborn statsmodels pingouin

Code language: Bash (bash)

Obviously, you don’t have to install all the prerequisites of this post and you can refer to the post about installing Python packages if you need more information about the installation process. Another option is to check the YouTube video explaining how to install statsmodels in a virtual environment. Note, if needed you can use pip to install a specific version of a package, as well.

Two Sample T-test

The two sample t-test is also known as the independent samples, independent, and unpaired t-test. Moreover, this type of statistical test compares two averages (means) and will give you information if these two means are statistically different from each other. The t-test also tells you whether the differences are statistically significant. In other words it lets you know if those differences could have happened by chance.

Example: clinical psychologists may want to test a treatment for depression to find out if the treatment will change the quality of life. In an experiment, a control group (e.g., a group who are given a placebo, or “sugar pill”, or in this case no treatment) is always used. The control group may report that their average quality of life is 3, while the group getting the new treatment might report a quality of life of 5. It would seem that the new treatment might work. However, it could be due to a fluke. In order to test this, the clinical researchers can use the two-sample t-test.

Hypotheses

Now, when performing t-tests you typically have the following two hypotheses:

  1.     Null hypotheses: Two group means are equal
  2.     Alternative hypotheses: Two group means are different (two-tailed)

Now, sometimes we also may have a specific idea about the direction of the condition. That is, we may, based on theory, assume that the condition one group is exposed to will lead to better performance (or worse). In these cases, the alternative hypothesis will be something like: the mean of one group either greater or lesser than another group (one-tailed).

Assumptions

Bofre we go on and import data so that we can practice carrying out t-test in Python we’ll briefly have a look at the assumptions of this parametric test. Now, besides that the dependent variables are interval/ratio, and are continuous, there are three assumptions that need to be met.

  • Assumption 1: Are the two samples independent?
  • Assumption 2: Are the data from each of the 2 groups following a normal distribution?
  • Assumption 3: Do the two samples have the same variances (Homogeneity of Variance)?

Note, do not worry if your data don’t follow the 3 assumptions above. For example, it is possible to carry out the Mann-Whitney U test in Python if your data is not normally distributed. Another option is to transform your dependent variable using square root, log, or Box-Cox in Python.

Example Data

First, before going on to the two-sample t-test in Python examples, we need some data to work with. In this blog post, we are going to work with data that can be found here. Furthermore, here we will import data from an Excel (.xls) file directly from the URL.

Importing Data from CSV

import pandas as pd data = 'https://gist.githubusercontent.com/baskaufs/1a7a995c1b25d6e88b45/raw/4bb17ccc5c1e62c27627833a4f25380f27d30b35/t-test.csv' df = pd.read_csv(data) df.head()

Code language: Python (python)

In the code chunk above, we first imported pandas as pd. Second, we created a string with the URL to the .csv file. In the fourth row, we used Pandas read_csv to load the .csv file into a dataframe. Finally, we used the .head() method to print the first five rows:

Python two sample t-test confidence interval

Example Dataframe

As can be seen in the image above, we have two columns (grouping and height). Luckily, the column names are eas to work with when we, later, are going to subset the data. If we, on the other hand, had long column names, renaming columns in the Pandas dataframe would be wise.

Subsetting the Data

Finally, before calculating some descriptive statistics, we will subset the data. In the code below, we use the query method to create two Pandas series objects:

# Subset data male = df.query('grouping == "men"')['height'] female = df.query('grouping == "women"')['height']

Code language: Python (python)

In the code chunk above, we first subset the rows containing men, in the column grouping. Subsequently, we do the exact same thing for the rows containing women. Note, that we are also selecting only the column named ‘height’ (i.e., the string within the brackets). Now, using the brackets and the column name as a string is one way to select columns in Pandas dataframe. Finally, if you don’t know the variable names, see the post “How to Get the Column Names from a Pandas Dataframe – Print and List“, for more information on how to get this information.

Descriptive Statistics

Now, we are going to use the groupby method together with the describe method to calculate summary statistics. Note, that here we use the complete dataframe:

df.groupby('grouping').describe()

Code language: Python (python)

Python two sample t-test confidence interval

As we are interested in the difference between ‘A’ and ‘B’, in the dataset, we used ‘grouping’ as input to the groupby method. If you are interested in learning more about grouping data and calculating descriptive statistics in Python, see the following two posts:

  • Descriptive Statistics in Python using Pandas
  • Python Pandas Groupby Tutorial

Here’s a quick note: if you are working with NumPy you can convert an array to integer. In the next section, you will finally learn how to carry out a two-sample t-test with Python. Note, if you by now know that your groups are not independent (i.e., they are the same individuals measured other two different conditions) you can instead use Python to do a paired sample t-test.

How to Check the Assumptions of the Two-Sample T-test in Python

In this section, we will cover how to check the assumptions of the independent samples t-test. Of course, we are only going to check assumption 2 and 3. That is, we will start by checking whether the data from the two groups are following a normal distribution (assumption 2). Second, we will check whether the two populations have the same variance. </p> <h3><span class="ez-toc-section" id="Checking_the_Normality_of_Data"></span>Checking the Normality of Data<span class="ez-toc-section-end"></span></h3> <p>There are several methods to check whether our data is normally distributed. Here, we will use the Shapiro-Wilks test. Here’s how to examine if the data follow the normal distribution in Python:</p> <pre class="wp-block-code" aria-describedby="shcb-language-6" data-shcb-language-name="Python" data-shcb-language-slug="python"><div><code class="hljs language-python">stats.shapiro(male) <span class="hljs-comment"># Output: (0.9550848603248596, 0.7756242156028748)</span> stats.shapiro(female) <span class="hljs-comment"># Output: (0.9197608828544617, 0.467536598443985)</span></code></div><small class="shcb-language" id="shcb-language-6"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">Python</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">python</span><span class="shcb-language__paren">)</span></small></pre> <p>In the code chunk above, we performed the Shapiro-Wliks test on both Pandas series (i.e., for each group seperately). Consequently, we get a tuple, for each time we use the shapiro method. This tuple contains the test statistics and the p-value. Here, the null hypothesis is that the data follows a normal distribution. Thus, we can infer that the data from both groups is normally distributed. </p> <p><span id="ezoic-pub-ad-placeholder-164" class="ezoic-adpicker-ad"></span><span class="ezoic-ad ezoic-at-0 leader-2 leader-2164 adtester-container adtester-container-164" data-ez-name="marsja_se-leader-2"><span id="div-gpt-ad-marsja_se-leader-2-0" ezaw="300" ezah="250" style="position:relative;z-index:0;display:inline-block;padding:0;min-height:250px;min-width:300px" class="ezoic-ad"><script data-ezscrex="false" data-cfasync="false" type="text/javascript" style="display:none">if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[300,250],'marsja_se-leader-2','ezslot_11',164,'0','0'])};if(typeof __ez_fad_position != 'undefined'){__ez_fad_position('div-gpt-ad-marsja_se-leader-2-0')};Now, there are of course other tests, see this excellent overview, for information. Finally, it is also worth noting that most statistical tests for normality is sensitive for large samples. Normality can also be explored visually using histograms, q-q plots, to name a few. See the post How to Plot a Histogram with Pandas in 3 Simple Steps.

Checking the Homogeneity of Variances Assumption in Python

Remember, before carrying out a t-test in Python, we also need to make sure that the variances in the two groups are equal. Here we’ll use Levene’s test to test for homogeneity of variances (equal variances) and this can be performed with the function levene as follow:

stats.levene(male, female) # Output: LeveneResult(statistic=0.026695150465104206, pvalue=0.8729335280501348)
Code language: Python (python)

Again, the p-value suggests that the data follows the assumption of equal variances. See this article for more information. Here are some options to Levene’s test of homogeneity:

  • Bartlett’s test of homogeneity of variances

It is worth noting here, that if our data does not fulfill the assumption of equal variances, we can use Welch’s t-test instead of Student’s t-test. See the references at the end of the post. Luckily, both Levene’s test and Bartlett’s test can be carried out in Python with SciPy (e.g. see above).

How to Carry Out a Two-Sample T-test in Python in 3 Ways

In this section, we are going to learn how to perform an independent samples t-test with Python. To be more exact, we will cover three methods: using SciPy, Pingouin, and Statsmodels. First, we will use SciPy:

1) T-test with SciPy 

Code language: Python (python)

In the code chunk above, we imported the ttest_ind method to carry out our data analysis. All the three methods, described in this post, requires that you already have imported Pandas and used it to load your dataset.

How to Interpret the Results from a T-test

In this section, you are briefly going to learn how to interpret the results from the two-sample t-test carried out with Python. Furthermore, this section will focus on the results from Pingouin and Statsmodels as they give us a more rich output (e.g., degrees of freedom, effect size). Finally, following this section, you will further learn how to report the t-test according to the guidelines of the American Psychological Association. 

Interpreting the P-value

Now, the p-value of the test is 0.017106, which is less than the significance level alpha (e.g., 0.05). Furthermore, this means that we can conclude that the men’s average height is statistically different from the female’s average height. </p> <p>Specifically, a p-value is a probability of obtaining an effect at least as extreme as the one in the data you have obtained (i.e., your sample), assuming that the null hypothesis is true. Moreover, p-values address only one question which is concerned about how likely your collected data is, assuming a true null hypothesis? Importantly, it cannot be used as support for the alternative hypothesis.</p> <h3><span class="ez-toc-section" id="Interpreting_the_Effect_Size_Cohens_D"></span>Interpreting the Effect Size (Cohen’s D)<span class="ez-toc-section-end"></span></h3> <p>One common way to interpret Cohen’s D that is obtained in a t-test is in terms of the relative strength of e.g. the condition. Cohen (1988) suggested that <em>d</em>=0.2 should be considered a ‘small’ effect size, 0.5 is a ‘medium’ effect size, and that 0.8 is a ‘large’ effect size. This means that if two groups’ means don’t differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically significant. </p> <h3><span class="ez-toc-section" id="Interpreting_the_Bayes_Factor_from_Pingouin"></span>Interpreting the Bayes Factor from Pingouin<span class="ez-toc-section-end"></span></h3> <p>Now, if you used Pingouin to carry out the two-sample t-test you might have noticed that we also get the Bayes Factor.&nbsp; See <a href="https://www.statisticshowto.com/bayes-factor-definition/" target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener">this post</a> for more information.</p> <div class="wp-block-image"><figure class="aligncenter size-full"><div class="ss-on-media-container wp-image-6630"><span class="ss-on-media-image-wrap wp-image-6630"><img width="1022" height="154" src="https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg" alt="Results from a two-samples t-test Python" class="ss-on-media-img wp-image-6630" srcset="https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg 1022w, https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python-980x148.jpg 980w, https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python-480x72.jpg 480w" sizes="(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) 1022px, 100vw"> <div class="ss-on-media-wrapper ss-top-left-on-media ss-small-icons ss-hide-on-mobile ss-with-spacing ss-circle-icons"> <ul class="ss-social-icons-container ss-on-media-pinit"> <li> <div data-ss-ss-link="https://pinterest.com/pin/create/button/?url=https%3A%2F%2Fwww.marsja.se%2Fhow-to-perform-a-two-sample-t-test-with-python-3-different-methods%2F&media=https://www.marsja.se/wp-content/uploads/2020/08/pingouin_two_samples_t-test_Python.jpg&description=via%20%40marsja" class="ss-pinterest-color ss-pinit-button ss-ss-on-media-button" data-ss-ss-location="on_media" data-ss-ss-network-id="pinterest" data-ss-ss-type="share"> <span class="ss-on-media-content"> <svg class="ss-svg-icon" aria-hidden="true" role="img" focusable="false" width="32" height="32" viewbox="0 0 32 32" xmlns="http://www.w3.org/2000/svg"><path d="M10.625 12.25c0-1.375 0.313-2.5 1.063-3.438 0.688-0.938 1.563-1.438 2.563-1.438 0.813 0 1.438 0.25 1.875 0.813s0.688 1.25 0.688 2.063c0 0.5-0.125 1.125-0.313 1.813-0.188 0.75-0.375 1.625-0.688 2.563-0.313 1-0.563 1.75-0.688 2.313-0.25 1-0.063 1.875 0.563 2.625 0.625 0.688 1.438 1.063 2.438 1.063 1.75 0 3.188-1 4.313-2.938 1.125-2 1.688-4.375 1.688-7.188 0-2.125-0.688-3.875-2.063-5.25-1.375-1.313-3.313-2-5.813-2-2.813 0-5.063 0.875-6.75 2.688-1.75 1.75-2.625 3.875-2.625 6.375 0 1.5 0.438 2.75 1.25 3.75 0.313 0.313 0.375 0.688 0.313 1.063-0.125 0.313-0.25 0.813-0.375 1.5-0.063 0.25-0.188 0.438-0.375 0.5s-0.375 0.063-0.563 0c-1.313-0.563-2.25-1.438-2.938-2.75s-1-2.813-1-4.5c0-1.125 0.188-2.188 0.563-3.313s0.875-2.188 1.625-3.188c0.75-1.063 1.688-1.938 2.688-2.75 1.063-0.813 2.313-1.438 3.875-1.938 1.5-0.438 3.125-0.688 4.813-0.688 1.813 0 3.438 0.313 4.938 0.938 1.5 0.563 2.813 1.375 3.813 2.375 1.063 1.063 1.813 2.188 2.438 3.5 0.563 1.313 0.875 2.688 0.875 4.063 0 3.75-0.938 6.875-2.875 9.313-1.938 2.5-4.375 3.688-7.375 3.688-1 0-1.938-0.188-2.813-0.688-0.875-0.438-1.5-1-1.875-1.688-0.688 2.938-1.125 4.688-1.313 5.25-0.375 1.438-1.25 3.188-2.688 5.25h-1.313c-0.25-2.563-0.188-4.688 0.188-6.375l2.438-10.313c-0.375-0.813-0.563-1.813-0.563-3.063z" /></svg>Save </span> </div> </li> </ul> </div> </span></div></figure></div> <h2><span class="ez-toc-section" id="Reporting_the_Results"></span>Reporting the Results<span class="ez-toc-section-end"></span></h2> <p><span id="ezoic-pub-ad-placeholder-168" class="ezoic-adpicker-ad"></span><span class="ezoic-ad ezoic-at-0 leader-4 leader-4168 adtester-container adtester-container-168" data-ez-name="marsja_se-leader-4"><span id="div-gpt-ad-marsja_se-leader-4-0" ezaw="336" ezah="280" style="position:relative;z-index:0;display:inline-block;padding:0;width:100%;max-width:1200px;margin-left:auto !important;margin-right:auto!important;min-height:280px;min-width:336px" class="ezoic-ad"><script data-ezscrex="false" data-cfasync="false" type="text/javascript" style="display:none">if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[336,280],'marsja_se-leader-4','ezslot_14',168,'0','0'])};if(typeof __ez_fad_position != 'undefined'){__ez_fad_position('div-gpt-ad-marsja_se-leader-4-0')};In this section, you will learn how to report the results according to the APA guidelines. In our case, we can report the results from the t-test like this:

There was a significant difference in height for men (M = 179.87, SD = 6.21) and women (M = 171.05, SD = 5.69); t(12) = 2.77, p = .017, %95 CI [1.87, 15.76], d = 1.48.

In the next section, you will also quickly learn how to visualize the data in two different ways: boxplots and violin plots.

Visualize the Data using Boxplots:

One way to visualize data from two groups is using the box plot:

import seaborn as sns sns.boxplot(x='grouping', y='height', data=df)
Code language: Python (python)