In last week’s blog, I wrote and did a video about how to remove
duplicate records and keep the most recent entry as long as a date column was
part of the data source. I came across
the scenario while giving training on Power BI with my company Pragmatic Works. See the video below:
This week, while doing another two-day training I came
across a different scenario from a follow-up conversation from day 1. I had explained how to remove duplicate
records and one of the students started working on a Power BI project she
has for her company. On day 2 the student
informed me that her remove duplicates step was not working. I said that is odd and I asked to see the
data. In one of her table visuals, I
could see that it appeared that a few of the records had
duplicates based on the name column.
After further investigation though, we figured out the culprit.
She had done all the steps correctly, but it was a data
integrity issue. In her data source, the
person in charge of entering the data had mistakenly typed a space for a few of
the records before entering the name value.
Thus, when Power BI looked for a duplicate in the name column it did its
job correctly because “Jamie” is unique compared to “ Jamie”. That leading white space was the culprit. Not only did it affect her Remove
Duplicates step, but it would also have affected a merging of two queries
if she had included that as part of her report.
For this week’s video, I want to show how to use the Trim
function in the Power Query Editor as a best practice to avoid these unwanted
outcomes for any future merges or removing of duplicates.
Comments
Post a Comment