How to find variance in python without inbuilt function

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    While working with Python, we can have a problem in which we need to find variance of a list cumulative. This problem is common in Data Science domain. Let’s discuss certain ways in which this problem can be solved.

    Method #1 : Using loop + formula
    The simpler manner to approach this problem is to employ the formula for finding variance and perform using loop shorthands. This is the most basic approach to solve this problem.

    test_list = [6, 7, 3, 9, 10, 15]

    print["The original list is : " + str[test_list]]

    mean = sum[test_list] / len[test_list]

    res = sum[[i - mean] ** 2 for i in test_list] / len[test_list]

    print["The variance of list is : " + str[res]]

    Output :

    The original list is : [6, 7, 3, 9, 10, 15]
    The variance of list is : 13.888888888888891
    

    Method #2 : Using statistics.variance[]
    This task can also be performed using inbuilt function of variance[].

    import statistics 

    test_list = [6, 7, 3, 9, 10, 15]

    print["The original list is : " + str[test_list]]

    res = statistics.variance[test_list]

    print["The variance of list is : " + str[res]]

    Output :

    The original list is : [6, 7, 3, 9, 10, 15]
    The variance of list is : 13.888888888888891
    


    If I have a list like this:

    results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
              0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    

    I want to calculate the variance of this list in Python which is the average of the squared differences from the mean.

    How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.

    Cleb

    23.4k18 gold badges105 silver badges142 bronze badges

    asked Feb 23, 2016 at 16:47

    2

    You can use numpy's built-in function var:

    import numpy as np
    
    results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
              0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    
    print[np.var[results]]
    

    This gives you 28.822364260579157

    If - for whatever reason - you cannot use numpy and/or you don't want to use a built-in function for it, you can also calculate it "by hand" using e.g. a list comprehension:

    # calculate mean
    m = sum[results] / len[results]
    
    # calculate variance using a list comprehension
    var_res = sum[[xi - m] ** 2 for xi in results] / len[results]
    

    which gives you the identical result.

    If you are interested in the standard deviation, you can use numpy.std:

    print[np.std[results]]
    5.36864640860051
    

    @Serge Ballesta explained very well the difference between variance n and n-1. In numpy you can easily set this parameter using the option ddof; its default is 0, so for the n-1 case you can simply do:

    np.var[results, ddof=1]
    

    The "by hand" solution is given in @Serge Ballesta's answer.

    Both approaches yield 32.024849178421285.

    You can set the parameter also for std:

    np.std[results, ddof=1]
    5.659050201086865
    

    answered Feb 23, 2016 at 16:55

    ClebCleb

    23.4k18 gold badges105 silver badges142 bronze badges

    2

    Starting Python 3.4, the standard library comes with the variance function [sample variance or variance n-1] as part of the statistics module:

    from statistics import variance
    # data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    variance[data]
    # 32.024849178421285
    

    The population variance [or variance n] can be obtained using the pvariance function:

    from statistics import pvariance
    # data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    pvariance[data]
    # 28.822364260579157
    

    Also note that if you already know the mean of your list, the variance and pvariance functions take a second argument [respectively xbar and mu] in order to spare recomputing the mean of the sample [which is part of the variance computation].

    answered Feb 28, 2019 at 21:34

    Xavier GuihotXavier Guihot

    47.3k21 gold badges264 silver badges167 bronze badges

    Well, there are two ways for defining the variance. You have the variance n that you use when you have a full set, and the variance n-1 that you use when you have a sample.

    The difference between the 2 is whether the value m = sum[xi] / n is the real average or whether it is just an approximation of what the average should be.

    Example1 : you want to know the average height of the students in a class and its variance : ok, the value m = sum[xi] / n is the real average, and the formulas given by Cleb are ok [variance n].

    Example2 : you want to know the average hour at which a bus passes at the bus stop and its variance. You note the hour for a month, and get 30 values. Here the value m = sum[xi] / n is only an approximation of the real average, and that approximation will be more accurate with more values. In that case the best approximation for the actual variance is the variance n-1

    varRes = sum[[[xi - m]**2 for xi in results]] / [len[results] -1]
    

    Ok, it has nothing to do with Python, but it does have an impact on statistical analysis, and the question is tagged statistics and variance

    Note: ordinarily, statistical libraries like numpy use the variance n for what they call var or variance, and the variance n-1 for the function that gives the standard deviation.

    answered Feb 23, 2016 at 17:35

    Serge BallestaSerge Ballesta

    138k11 gold badges114 silver badges234 bronze badges

    0

    Numpy is indeed the most elegant and fast way to do it.

    I think the actual question was about how to access the individual elements of a list to do such a calculation yourself, so below an example:

    results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
          0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    
    import numpy as np
    print 'numpy variance: ', np.var[results]
    
    
    # without numpy by hand  
    
    # there are two ways of calculating the variance 
    #   - 1. direct as central 2nd order moment [//en.wikipedia.org/wiki/Moment_[mathematics]]divided by the length of the vector
    #   - 2. "mean of square minus square of mean" [see //en.wikipedia.org/wiki/Variance]
    
    # calculate mean
    n= len[results]
    sum=0
    for i in range[n]:
        sum = sum+ results[i]
    
    
    mean=sum/n
    print 'mean: ', mean
    
    #  calculate the central moment
    sum2=0
    for i in range[n]:
        sum2=sum2+ [results[i]-mean]**2
    
    myvar1=sum2/n
    print "my variance1: ", myvar1
    
    # calculate the mean of square minus square of mean
    sum3=0
    for i in range[n]:
        sum3=sum3+ results[i]**2
    
    myvar2 = sum3/n - mean**2
    print "my variance2: ", myvar2
    

    gives you:

    numpy variance:  28.8223642606
    mean:  -3.731599805
    my variance1:  28.8223642606
    my variance2:  28.8223642606
    

    answered Feb 23, 2016 at 19:49

    roadrunner66roadrunner66

    7,5624 gold badges31 silver badges37 bronze badges

    import numpy as np
    def get_variance[xs]:
        mean = np.mean[xs]
        summed = 0
        for x in xs:
            summed += [x - mean]**2
        return summed / [len[xs]]
    print[get_variance[[1,2,3,4,5]]]
    

    out 2.0

    a = [1,2,3,4,5]
    variance = np.var[a, ddof=1]
    print[variance]
    

    answered Aug 26, 2019 at 7:47

    1

    The correct answer is to use one of the packages like NumPy, but if you want to roll your own, and you want to do incrementally, there is a good algorithm that has higher accuracy. See this link //www.johndcook.com/blog/standard_deviation/

    I ported my perl implementation to Python. Please point out issues in the comments.

    Mklast = 0
    Mk = 0
    Sk = 0
    k  = 0 
    
    for xi in results:
      k = k +1
      Mk = Mklast + [xi - Mklast] / k
      Sk = Sk + [xi - Mklast] * [ xi - Mk]
      Mklast = Mk
    
    var = Sk / [k -1]
    print var
    

    Answer is

    >>> print var
    32.0248491784
    

    answered Jul 22, 2019 at 20:37

    Mark LakataMark Lakata

    19.3k5 gold badges99 silver badges120 bronze badges

    1

    Without imports, I would use the following python3 script:

    #!/usr/bin/env python3
    
    def createData[]:
        data1=[12,54,60,3,15,6,36]
        data2=[1,2,3,4,5]
        data3=[100,30000,1567,3467,20000,23457,400,1,15]
    
        dataset=[]
        dataset.append[data1]
        dataset.append[data2]
        dataset.append[data3]
    
        return dataset
    
    def calculateMean[data]:
        means=[]
        # one list of the nested list
        for oneDataset in data:
            sum=0
            mean=0
            # one datapoint in one inner list
            for number in oneDataset:
                # summing up
                sum+=number
            # mean for one inner list
            mean=sum/len[oneDataset]
            # adding a tuples of the original data and their mean to
            # a list of tuples
            item=[oneDataset, mean]
            means.append[item]
    
        return means
    
    # to do: substract mean from each element and square the result
    # sum up the square results and divide by number of elements
    def calculateVariance[meanData]:
        variances=[]
        # meanData is the list of tuples
        # pair is one tuple
        for pair in meanData:
            # pair[0] is the original data
            interResult=0
            squareSum=0
            for element in pair[0]:
                interResult=[element-pair[1]]**2
                squareSum+=interResult
            variance=squareSum/len[pair[0]]
            variances.append[[pair[0], pair[1], variance]]
    
        return variances
    
    
    
    
    
    def main[]:
        my_data=createData[]
        my_means=calculateMean[my_data]
        my_variances=calculateVariance[my_means]
        print[my_variances]
    
    if __name__ == "__main__":
        main[]
    

    here you get a print of the original data, their mean and the variance. I know this approach covers a list of several datasets, yet I think you can adapt it quickly for your purpose ;]

    answered Jan 6, 2020 at 10:45

    ShushiroShushiro

    4918 silver badges27 bronze badges

    Here's my solutions

    vac_nums = [0,0,0,0,0, 1,1,1,1,1,1,1,1, 2,2,2,2, 3,3,3 ] #your code goes here

    mean = sum[vac_nums]/len[vac_nums];
    
    count=0;
    
    for i in range[len[vac_nums]]:
       variance = [vac_nums[i]-mean]**2;
       count += variance;
    
    print [count/len[vac_nums]];
    

    answered Feb 4 at 20:18

    1

    sometimes all I wanna do it shut my brain off and COPY PASTE

    import math
    def get_mean_var[results]:
      # calculate mean
      mean = round[sum[results] / len[results], 2]
    
      # calculate variance using a list comprehension
      var = round[math.sqrt[sum[[xi - mean] ** 2 for xi in results] / len[results]], 2]
      return mean, var
    

    USAGE

    get_mean_var[[1,3,34]]
    

    [12.67, 15.11]

    answered Jul 13 at 4:31

    gndpsgndps

    3952 silver badges13 bronze badges

    How do you manually calculate variance in Python?

    Coding a stdev[] Function in Python Our stdev[] function takes some data and returns the population standard deviation. To do that, we rely on our previous variance[] function to calculate the variance and then we use math. sqrt[] to take the square root of the variance.

    How do you find the mode without inbuilt function in Python?

    the best way to find mode is using dict. the key is user input. value is the frequency..
    First get unique elements from the input. ... .
    Make a new_empty dictionary..
    This dictionary stores keys as unique elements and values as how many times the current element is repeated in original input..

    How do you find the variance of data in Python?

    Steps to Finding Variance.
    Find a mean of the set of data..
    Subtract each number from a mean..
    Square the result..
    Add the results together..
    Divide a result by the total number of numbers in the data set..

    How do you find standard deviation in Python without library?

    “standard deviation in python without numpy” Code Answer.
    xs = [0.5,0.7,0.3,0.2] # values [must be floats!].
    mean = sum[xs] / len[xs] # mean..
    var = sum[pow[x-mean,2] for x in xs] / len[xs] # variance..
    std = math. sqrt[var] # standard deviation..

    Chủ Đề