In this video, we're going to cover how and when to use transformation and cleanse when you're in Toad workbook. OK so I've got example workbook open right now. And I've got a workbook that's about customers and orders. And it has two workflows in it. One is a basic query about customers. And one is about orders. Let's look at this customer query that I've got here.
When I look at this customer query I can see that I've been joining address, contact, and customer data. And brought back a series of results. Now I can start making pivot grids like I've done so far against this result or doing other analysis. But what if I want to embellish, or clean, or filter these results? What's the best way for me to do that?
Well, one of the options I have is to do transformation and cleanse. So I'm going to walk you through doing that now. So transformation and cleanse is a step that you add after you've done the query. So I've gone to my query step here. I'm going to add a child step. It's a work step. And we're to choose transformation and cleanse.
And this will open up my data within a transformation and cleanse window. I'm going to get rid of my workbook window by just unpinning it. So that hides it. So we can take a look at this transformation and cleanse window.
Now keep in mind I'm just looking at the first 1,000 rows of records. Because just revisit, that's what I'm fetching within my workbook. But within this 1,000 rows of records. Put that back. I'm looking at some sample data within my workbook.
OK, so over here on the left-hand side I have all of the columns in my query results. Over here I've got my 1,000 rows of data, of actual data in those columns. And then over here on the right, I've got a little bit of profile information about each column as I visit it.
So let's visit this sex column here. When I click into the sex column I can see I have two values within the sex column. I've got an M-- it's very small on my screen-- and an F.
This represents the number of values that I have for each one of those. Or a number of records I have for each one of those values. I'm just going to minimize this graphic.
I could also look at the tree map here. And again, this is just for the 1,000 records that I've brought up. And it will show me how many I've got that are distinct, how many are populated, how many are repeated, and how many nulls I have. So 27% of the values are nulls.
If I want to look at this information in a little more tabular format, it's right here in this summary next to me. How many rows I have null, how many rows where it's missing but not null, how many rows where it's populated, and how many distinct values I have in this column. So again, we know it's M and F for here.
And then it tells me the actual string type. And then it tells me the top patterns within that data. So that's this here. And then this steps tab here we're going to walk through.
So as I look at this data and I decide to make adjustments to the data. Like for example let's go ahead and in this column here, for sex, let's go ahead make an adjustment by removing our nulls.
So let's replace the null value. So I'm going to come here, find nulls and replace with unknown. And it gives me a little data on here as well-- found in 268 rows, replacing it with unknown. I'm going to hit Apply Rule.
OK. So now I've replaced all of my gender nulls with unknowns. Let's do some more transforming of this data. Let's isolate the home or the business. Let's start with business area code. So let's, first of all, take this column and split it. And we're going to have it-- we'll have it split this column down here at a separator. And the separator we're going to split it at is the ending parentheses. And we'll split it into two columns.
OK. Looks good. Apply that rule. Here's the new two columns that are created. We'll now take this new column here, and let's replace this parentheses that ended up in that value. As you can see, I split it here, so I grabbed that first parentheses. And let's replace that with nothing and apply that rule. Great. So now I've got this area code here.
Now let's do some renaming of these columns. First of all, let's go ahead and rename this column-- this one out here, this Business Phone Split 1-- let's rename this column Area Code. So we have the area code. And let me actually be a little bit more specific. We'll write business area code since it's the business phone number area code.
And then this second split here-- let's decide that I don't need this output in my actual end data results. And so I'm going to pick this column here and say let's remove this column. And so we can see I've just unchecked the ones I don't want. I don't want this one to apply that rule.
What other things can we do inside transformation and cleanse? Well, let's take a look at the birth date. Let's extract the birth year. A lot of times, people don't want their actual birth date. Let's just pick the birth year. And so let's go ahead and extract from this Date column. We're just going to take the year and apply this rule.
And again, I've got a fun column here called Birth Date Year. Let's change the name of this column as well. So let's rename this