Python goodness of fit test

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    In this article, we are going to see how to Perform a Chi-Square Goodness of Fit Test in Python

    The Chi-Square Goodness of fit test is a non-parametric statistical hypothesis test that’s used to determine how considerably the observed value of an event differs from the expected value. it helps us check whether a variable comes from a certain distribution or if a sample represents a population. The observed probability distribution is compared with the expected probability distribution. 

    Python goodness of fit test

    null hypothesis:  A variable has a predetermined distribution.

    Alternative hypotheses: A variable deviates from the expected distribution.

    Example 1: Using stats.chisquare() function

    In this approach we use stats.chisquare() method from the scipy.stats module which helps us determine chi-square goodness of fit statistic and p-value. 

    Syntax: stats.chisquare(f_obs, f_exp)

    parameters:

    • f_obs : this parameter contains an array of observed values.
    • f_exp : this parameter contains an array of expected values.

    In the below example we also use the stats.ppf() method which takes the parameters level of significance and degrees of freedom as input and gives us the value of chi-square critical value. if chi_square_ value > critical value, the null hypothesis is rejected. if chi_square_ value <= critical value, the null hypothesis is accepted. in the below example chi_square value is 5.0127344877344875 and the critical value is 12.591587243743977. As chi_square_ value <=, critical_value null hypothesis is accepted and the alternative hypothesis is rejected.

    Python3

    import scipy.stats as stats

    import numpy as np

    observed_data = [8, 6, 10, 7, 8, 11, 9]

    expected_data = [9, 8, 11, 8, 10, 7, 6]

    chi_square_test_statistic, p_value = stats.chisquare(

        observed_data, expected_data)

    print('chi_square_test_statistic is : ' +

          str(chi_square_test_statistic))

    print('p_value : ' + str(p_value))

    print(stats.chi2.ppf(1-0.05, df=6))

    Output:

    chi_square_test_statistic is : 5.0127344877344875
    p_value : 0.542180861413329
    12.591587243743977

    Example 2: Determining chi-square test statistic by implementing formula

    In this approach, we directly implement the formula. we can see that we get the same values of chi_square. 

    Python3

    import scipy.stats as stats

    import numpy as np

    observed_data = [8, 6, 10, 7, 8, 11, 9]

    expected_data = [9, 8, 11, 8, 10, 7, 6]

    chi_square_test_statistic1 = 0

    for i in range(len(observed_data)):

        chi_square_test_statistic1 = chi_square_test_statistic1 + \

            (np.square(observed_data[i]-expected_data[i]))/expected_data[i]

    print('chi square value determined by formula : ' +

          str(chi_square_test_statistic1))

    print(stats.chi2.ppf(1-0.05, df=6))

    Output:

    chi square value determined by formula : 5.0127344877344875
    12.591587243743977

    How do you do a goodness of fit test in Python?

    First, create a data frame with 8 intervals as below. Create two columns each for observed and expected frequency. Use Pandas' apply method to calculate the observed frequency between intervals. We are now ready to perform the Goodness-of-Fit test.

    How do you test for goodness of fit?

    There are multiple types of goodness-of-fit tests, but the most common is the chi-square test. The chi-square test determines if a relationship exists between categorical data. The Kolmogorov-Smirnov test determines whether a sample comes from a specific distribution of a population.

    What is goodness of fit test used for?

    The Chi-square goodness of fit test is a statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not. It is often used to evaluate whether sample data is representative of the full population.

    How do you do a chi

    To use the chi-square test, we can take the following steps:.
    Define the null (H0) and alternative (H1) hypothesis..
    Determine the value of alpha (𝞪) for according to the domain you are working. ... .
    Check the data for Nans or other kind of errors..
    Check the assumptions for the test..