How to use merge in python
Show
It doesn’t matter whether you’re a data scientist, data analyst, business analyst, or data engineer. If you’ve been using Python in your work — especially for data preprocessing/cleaning — you’d have used Pandas in some ways. Why “Merge”?You’d have probably encountered multiple data tables that have various bits of information that you would like to see all in one place — one dataframe in this case. And this is where the power of merge comes in to efficiently combine multiple data tables together in a nice and orderly fashion into a single dataframe for further analysis.
The words “merge” and “join” are used relatively interchangeably in Pandas and other languages. Despite the fact that Pandas has both “merge” and “join” functions, essentially they both do the similar things. That said, we’ll focus solely on the “merge” function used in Pandas throughout this article. What exactly is “Merge”?(Source)Looking at the documentation of the merge function (pandas version = 0.24.1), it looks foreign and not easily understandable to readers (at least to me) at the first glance. After seeking more information and explanation from some of my friends and online resources, I started understanding how this concept — merge could actually be explained in a much simpler way and began to appreciate the beauty of this merge function in Pandas. To understand
Since the method After that we’ll explain the method The explanation below comes largely from the great tutorial by Shane Lynn and the data used (3 CSV files) comes from the KillBiller application. You can get the data HERE. Let’s get started! A Quick Look at the DataLet’s first understand the data sets used with the following explanation on each dataframe.
It’s important to note here that:
1. LEFT MergeKeep every row in the left dataframe. Where there are missing values of the “on” variable in the right dataframe, add empty / NaN values in the result. LEFT MergeWith the operation above, left_merge has the same size as user_usage as we keep all the rows in the left dataframe using the As expected, the column 2. RIGHT MergeTo
perform the right merge, we just repeat the code above by simply changing the parameter of With the operation above, right_merge has the same size as
user_device as we keep all the rows in the right dataframe using the This time, we see that the empty values are replaced by NaN in the left dataframe — user_usage. 3. INNER MergePandas uses “inner” merge by default. This keeps only the common values in both the left and right dataframes for the merged data. In our case, only the rows that contain Although the “inner” merge is used by Pandas by default, the parameter With the operation above, the merged data — inner_merge has different size compared to the original left and right dataframes (user_usage & user_device) as only common values are merged. 4. OUTER MergeFinally, we have “outer” merge. The “outer” merge combines all the rows for left and right dataframes with NaN when there are no matched values in the rows. OUTER MergeNotice that the method To further illustrate how the “outer” merge works, we purposely specify certain rows of the outer_merge to understand where the rows originate from. For the 1st and 2th
rows, the rows come from both the dataframes as they have the same values of For the 3rd and 4th rows, the rows come from the left dataframe as the right dataframe doesn’t have the common values of For the 5th and 6th rows, the rows come from the right dataframe as the left dataframe doesn’t have the common values of 💡 Merge Dataframes with Different Column NamesSo we’ve talked about how to merge data using different ways — left, right, inner, and outer. But the method Therefore, we use Here we’ve merged user_device with android_device since they both contain common codes in their columns — Final Thoughts(Source)Thank you for reading. Merging data from different data tables is one of the most common tasks that we as a data professional have to do. I hope this sharing is beneficial to you in some ways. As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! 😄 About the AuthorAdmond Lee is currently the Co-Founder/CTO of Staq — the #1 business banking API platform for Southeast Asia. Want to get free weekly data science and startup insights? Join Admond’s email newsletter — Hustle Hub, where every week he shares actionable data science career tips, mistakes & learnings from building his startup — Staq. You can connect with him on LinkedIn, Medium, Twitter, and Facebook. How do you use merge function in Python?Must be found in both the left and right DataFrame objects. left_on − Columns from the left DataFrame to use as keys.
...
Merge Using 'how' Argument.. How do you merge rows in Python?To merge rows within a group together in Pandas we can use the agg(~) method together with the join(~) method to concatenate the row values.
How do you merge series in Python?concat() method you can combine/merge two or more series into a DataFrame (create DataFrame from multiple series). Besides this you can also use Series. append() , pandas.. Using pandas. concat() to Combine Two Series. ... . Combine Two Series Using pandas. merge() ... . Combine Two Series Using DataFrame. join() ... . Using Series.. How do you merge two columns in Python?By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
|