A future post is going to cover comparing two lists of data. But before tackling that problem, it is important that we understand the tools we may need to get our data ready to compare.
It is likely that we will need to either break up values, or combine them. In Excel and database speak these are actions are parse and concatenate respectively.
An example of concatentate would be if first and last names were in separate columns. It is easier to compare the names if they are combined into one value such as First Last or Last, First. Both of these are simple to accomplish. The formula method is to use =concatentate(A2,B2) which yields "Jack Smith" with a space between the names. See below:
Obviously if we wanted the last name first, we would have reversed the two entries. To accomplish the result 'Smith, Jack' We need to tell Excel to add a comma and a space. Add a third argument to the formula ", " quotations included.
You should know that there is a bit of a shortcut for concatenating values. You can simply use the & symbol as in =A2&B2. Note that in order to have spaces or commas these will require addtional arguments: =B2&", "&A2.
The opposite condition would be if you needed to parse a field into separate values. Say for example one list is stored as Last, First and a second list is First MI Last. In order to compare these two lists as we will in our next post, we need to make sure we are comparing apples to apples. This is easier than you might think. Using separate formulas, or a series of formulas, it is possible to break up even very complicated values. Then you can put them back together however you want by concatenating. Another example would be a list of driver license numbers that start with a two letter state designation such as NY-555-55-5555.
The formulas we need to know about are
Len, Find, Left, Mid, and Right. These can be found under the text category. For all of these functions think of the characters in the cell as being numbered from left to right. Use Left to grab the first two characters of the driver's license:
=left(A2,2).

This would return NY. To truly parse the data into separate fields, we would want a third column that gave us the DL# without the leading state letters or the hyphen. To accomplish this, we can use the function
Len to determine how many characters are in the field (this many vary down the list) and then grab everything to the right of a particular character. We can do this in two steps in separate columns, or combine these steps into one expression. If you find this process confusing, use separate columns for each step. The first step is to determine how many characters are in the field:
=Len(A2) which returns 14. We want the number without the first 3 characters, two for the state, and the hyphen. So we could use =right(A2,11) to extract the number. But think it through first. We know that different states use different number formats and lengths. So, in order to have the formula apply to all conditions we should let the number of characters needed be calculated instead of assumed to be 11. The following modification will take care of it in one step:
=right(A2,len(A2)-3). Again, if you are uncomfortable nesting functions then simply break these out in different columns and reference the other columns to get your desired result. This function
right starts, not surprisingly, from the right side. If we wanted the last four digits off a social security number for example, we could use
right(A2,4).
The names are a little more complicated to parse. But if you think about how you would do it you would find the comma, figure out what digit it is, and then go left for the last name or right for the first. Step one is to use Find: =find(",",A2) which returns 6 for 'Smith, Jack.' To parse the last name we would use this result with function left. Combined into one step it would be =left(A2,len(A2)-find(",",A2)). Once we have successfully parsed the last name, we know how many characters we DON'T want in the first name. So we can reference the results of this previous formula. Remembering that the first name would be the length of the combined name minus the characters of the last name, minus the two characters of the comma and the space. So to parse the first name we use =right(A2,len(A2)-B2-1) to get the value 'Jack.' Notice that since the value in B2 = 6 we reference the cell directly instead of recalculating this value. This is a good example of when it is conceptually easier to break it between two columns. You may have expected that we would be subtracting 2 characters for the comma and the space, but since the find function returned the position of the comma, the length of the first name would be one less character.
The only function we have not yet used is MID. We used the Right function since the name lengths vary. If for example we had a product code that was made up of Vendor#-Category- Part#, and each part was 4 digits such as KLaA-0123-4567, then we could parse the category using =mid(A2,6,4).
I should mention that there are many different ways to accomplish each task. For the relatively simple examples above, you could use mid instead of right. Then, instead of determining length by subtracting the partial characters from the total characters, you could instead use a number that is larger than you would need, like 25. Conveniently, Excel does not add spaces if this number exceeds the number of characters present. I generally use the slightly more complicated method above because I respond to its logical and precise methodology, conceptually.
If all of this seems like way too much work just split out some names or numbers, then you are probably not thinking of a list of 50,000 names. Obviously if you only had 15 names in the list, it would probably be just as quick to copy and paste. Next time when we compare data lists, you will need the skills we have developed here. Although we will use lists that are already in the same format, in real projects most lists require some reformatting.
Comments