How to remove outliers using z-score in python

In Machine Learning, we often come across situations where we see outliers present in the data set. These outliers are nothing but extreme values present or we can say the values that do not follow the pattern in the data. The values that diverge from all other values are termed as outliers.

These outliers can arise due to different factors like human error while preparing the data or internationally putting outliers in the data to test the model and many other different reasons. But are they beneficial for us while building predictive models? The answer is sometimes we have to drop these outliers and sometimes when we retain them as they hold some interesting meaning.

THE BELAMY

In this article, we will be discussing how we should detect outliers in the dataset and remove them using different ways. We will use a weight-height dataset that is available on Kaggle publicly. The data set contains weight and height values, we will search for outliers in the weight column.

What you will learn from this article?

ALSO READ

What are Outliers? How to find them?
What are Z-score and Standard deviation?
How to remove Outliers using Z-score and Standard deviation?

An outlier is nothing but the most extreme values present in the dataset. The values that are very unusual in the data as explained earlier. Let us find the outlier in the weight column of the data set. We will first import the library and the data. Use the below code for the same.

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv["weight.csv"]

df.Weight

Now we will plot the histogram and check the distribution of this column. Use the below code for the same.

plt.hist[df.Weight, bins=20, width=0.8]

plt.xlabel['Weight']

plt.ylabel['Count']

plt.show[]

From the above graph, we can see that data is centred towards the mean and follows a normal distribution. The value going towards the left to the mean is decreasing whereas it is increasing towards the right. Let us see the descriptive statistics of this column like mean, standard deviation, min, and maximum values. Use the below code for the same.

df.Weight.describe[]

The mean of the weight column is found to be 161.44 and the standard deviation to be 32.108. The min and max values present in the column are 64 and 269 respectively. Now we will use 3 standard deviations and everything lying away from this will be treated as an outlier. We will see an upper limit and lower limit using 3 standard deviations. Every data point that lies beyond the upper limit and lower limit will be an outlier. Use the below code for the same.

upper = df.Weight.mean[] + 3*df.Weight.std[]

lower = df.Weight.mean[] -3*df.Weight.std[]

print[upper]

print[lower]

Now we will see what are those data points that fall beyond these limits.

The above two data points are now treated as outliers. Now if we want to remove it we can just pick those data points that fall under these limits. Use the below code for the same.

new_df= df[[df.Weightlower]]

new_df.head[]

new_df.shape[]

The original data had 10,000 rows and now the new data frame has 9998 and those 2 rows that were treated as outliers are now removed. Now we will do the same thing using a Z- score that tells about how far data is away from standard deviation. It is calculated by subtracting the mean from the data point and dividing it by the standard deviation. Let us see practically how this is done.

df['zscore'] = [ df.Weight - df.Weight.mean[] ] / df.Weight.std[]

df.head[5]

We can see for each row the z score is computed. Now we will check only those rows that have z score greater than 3 or less than -3. Use the below code for the same.

df[df['zscore']>3]

df[df['zscore']-3] & [df.zscore


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			Nghị định 40 2023 nđ cp

		
	

		
		
		   
		   
		   
		
		
			Create table in docx python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng python systemexit python

		
	

		
		
		   
		   
		   
		
		
			How do you break a line in a string in python?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn normalization in php

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn link in php

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng sha-3 trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn python return multiple

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn instant wordpress

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng entier trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn nodejs error object

		
	

		
		
		   
		   
		   
		
		
			Sở nội vụ đà nẵng tuyển dụng 2023

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn check equal string php

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn current week php

		
	

		
		
		   
		   
		   
		
		
			How do you get html content of a page?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn chuyển xd sang html

		
	

		
		
		   
		   
		   
		
		
			Create a php script that will display your birth month

		
	

		
		
		   
		   
		   
		
		
			How to remove line in excel

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng recquire trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng assign operator trong PHP

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 7 tết mậu thân năm 1968 đã diễn ra sự kiện gì ở miền nam nước ta 2023
			5 tháng trước
		
	



	
	 
		#2
		
			Top 13 luyện từ và câu: dấu gạch ngang lớp 4 trang 45 2023
			5 tháng trước
		
	



	
	 
		#3
		
			Top 6 trong mặt phẳng oxy ảnh của đường thẳng d 3x y 4=0 2023
			5 tháng trước
		
	



	
	 
		#4
		
			Top 6 thử thách thần chết thuyết minh phần 2 2023
			5 tháng trước
		
	



	
	 
		#5
		
			Top 4 vở bài tập tiếng việt lớp 3 tập 2 chính tả trang 15 2023
			5 tháng trước
		
	



	
	 
		#6
		
			Top 5 áo khoác nam quảng châu cao cấp 2023
			5 tháng trước
		
	



	
	 
		#7
		
			Top 4 nội dung nào sau đây không phải là trách nhiệm của đơn vị đầu mối cung cấp thông tin 2023
			5 tháng trước
		
	



	
	 
		#8
		
			Top 9 mẫu đồng phục công sở đẹp 2022 2023
			5 tháng trước
		
	



	
	 
		#9
		
			Top 5 ốp lưng iphone 13 pro bảo vệ camera 2023
			5 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Văn phòng hà sơn hải vân ở hà nội năm 2024

		
	

		
		
		   
		   
		   
		
		
			Lỗi khi cài win 7 require cd từ usb năm 2024

		
	

		
		
		   
		   
		   
		
		
			31 10 11 đến chơi nào eunhyuk donghae full năm 2024

		
	

		
		
		   
		   
		   
		
		
			Lỗi alt click to define trên photoshop có nghĩa gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Trưởng ban dân vận tỉnh ủy tiếng anh là gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài tập trắc nghiệm amino axit có đáp án năm 2024

		
	

		
		
		   
		   
		   
		
		
			Thư giãn đầu óc tiếng anh là gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài tập powerpoint cơ bản cho tiểu học năm 2024

		
	

		
		
		   
		   
		   
		
		
			Đi xe grab 4 chỗ trung bình giá bao nhiêu năm 2024

		
	

		
		
		   
		   
		   
		
		
			New years eve đi với giới từ nào năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Là gì
		  Mẹo Hay
		  Toplist
		  Địa Điểm Hay
		  Học Tốt
		  mẹo hay
		  Công Nghệ
		  Nghĩa của từ
		  Bao nhiêu
		  Khỏe Đẹp
		  đánh giá
		  Tiếng anh
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  Món Ngon
		  So Sánh
		  So sánh
		  Bài tập
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  javascript
		  Ở đâu
		  Thế nào
		  Hướng dẫn
		  Dịch 
		  Tại sao
		  Máy tính
		  Đại học