Hướng dẫn z-score normalization python numpy
In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score: Show z = (X – μ) / σ where:
This tutorial explains how to calculate z-scores for raw data values in Python. How to Calculate Z-Scores in PythonWe can calculate z-scores in Python using scipy.stats.zscore, which uses the following syntax: scipy.stats.zscore(a, axis=0, ddof=0, nan_policy=’propagate’) where:
The following examples illustrate how to use this function to calculate z-scores for one-dimensional numpy arrays, multi-dimensional numpy arrays, and Pandas DataFrames. Numpy One-Dimensional ArraysStep 1: Import modules. import pandas as pd import numpy as np import scipy.stats as stats Step 2: Create an array of values. data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22]) Step 3: Calculate the z-scores for each value in the array. stats.zscore(data)
[-1.394, -1.195, -1.195, -0.199, 0, 0, 0.398, 0.598, 1.195, 1.793]
Each z-score tells us how many standard deviations away an individual value is from the mean. For example:
Numpy Multi-Dimensional ArraysIf we have a multi-dimensional array, we can use the axis parameter to specify that we want to calculate each z-score relative to its own array. For example, suppose we have the following multi-dimensional array: data = np.array([[5, 6, 7, 7, 8], [8, 8, 8, 9, 9], [2, 2, 4, 4, 5]]) We can use the following syntax to calculate the z-scores for each array: stats.zscore(data, axis=1) [[-1.569 -0.588 0.392 0.392 1.373] [-0.816 -0.816 -0.816 1.225 1.225] [-1.167 -1.167 0.5 0.5 1.333]] The z-scores for each individual value are shown relative to the array they’re in. For example:
Pandas DataFramesSuppose we instead have a Pandas DataFrame: data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C']) data A B C 0 8 0 9 1 4 0 7 2 9 6 8 3 1 8 1 4 8 0 8 We can use the apply function to calculate the z-score of individual values by column: data.apply(stats.zscore) A B C 0 0.659380 -0.802955 0.836080 1 -0.659380 -0.802955 0.139347 2 0.989071 0.917663 0.487713 3 -1.648451 1.491202 -1.950852 4 0.659380 -0.802955 0.487713 The z-scores for each individual value are shown relative to the column they’re in. For example:
Additional Resources: How to Calculate Z-Scores in Excel |