Combining if-statements (nesting)
Is there a possibility of nesting if-statements whereby the criteria is the same but the data may vary? I prefer to segregate a full name string into parts of the full string and place them into individual respective fields.
Example - splitting first name and last name from a combined name field:
Expected outcome: First Name 1: John ; Middle Name 1: M. ; Last Name 1: Doe ; Suffix 1: (null) ; First Name 2: Jane ; Middle Name 2: (null) ; Last Name 2: Wright ; Suffix 1: (null)
1) Original data: John M. Doe And Jane Wright
2) Original data: John M. Doe & Jane Wright
3) Original data: John M. Doe In C/O Jane Wright
I was only able to use RegexIsMatch() and LSplit() functions for one criteria. I prefer to code several combinations of the search criteria into one long statement if possible and split the names accordingly.
Below is code I started with while removing any names containing a company name:
If((RegexIsMatch(LTrim(Full Name)," [&] ") .Or. RegexIsMatch(LTrim(Full Name)," [Aa][Nn][Dd] ") .Or. RegexIsMatch(LTrim(Full Name)," [Cc]/[Oo] ") <>0) .And. ((RegexIsMatch(LTrim(Full Name),"[^Ll][^Ll][^Cc]")) .Or. RegexIsMatch(LTrim(F5),"[^Ii][^Nn][^Cc]") .Or. RegexIsMatch(LTrim(F5),"[^Cc][^Oo][^Rr][^Pp]*"))<>0 , LSplit(LTrim(F5),5,"&",1) ,'')
This only works for one match, but it does not split them into individual fields as the example above. I am only able to pull just the first section of the full name string. I need it to work for a variety of matches into a single-coded statement. Please advise. Thanks.
Answers
-
You run the risk of a very complex formula to both read and maintain.
If you existing formula successfully split s the two names field into 2 separate names in one field each just take those fields and split them again. There is a name splitting function (NSplit) which may work if the names are consistent enough. Otherwise you will need to work put the option possible in the database and take it from there. Other tools are available. If the data is really inconsistent, decide how far accuracy is important.
One tip for complex stuff is to break it down step by step and output field by output field. Make it easy to deal with at each data extraction and transition step.
I would tend t leave it like that unless you are running out of fields or record characters.
If you really need to combine things come up with the formulas based on the calculated field Names and then substitute the field names with the formulas that create those fields. You will end up with one incredibly complex formula but could then delete the intermediate fields used to get there. (If you really must ....)
Bear in mind you are going to need 8 or these somewhat complex formulas - although To get the First Name (John) and the last name (Wright) can be done much more easily, hopefully.
HTH.
Grant
0 -
Altair Forum User said:
You run the risk of a very complex formula to both read and maintain.
If you existing formula successfully split s the two names field into 2 separate names in one field each just take those fields and split them again. There is a name splitting function (NSplit) which may work if the names are consistent enough. Otherwise you will need to work put the option possible in the database and take it from there. Other tools are available. If the data is really inconsistent, decide how far accuracy is important.
One tip for complex stuff is to break it down step by step and output field by output field. Make it easy to deal with at each data extraction and transition step.
I would tend t leave it like that unless you are running out of fields or record characters.
If you really need to combine things come up with the formulas based on the calculated field Names and then substitute the field names with the formulas that create those fields. You will end up with one incredibly complex formula but could then delete the intermediate fields used to get there. (If you really must ....)
Bear in mind you are going to need 8 or these somewhat complex formulas - although To get the First Name (John) and the last name (Wright) can be done much more easily, hopefully.
HTH.
Grant
Thanks, Grant. I totally agree with you that it is best to manipulate a complex full name like this straight in a database, if I understood you correctly.
Henry
0 -
Altair Forum User said:
Thanks, Grant. I totally agree with you that it is best to manipulate a complex full name like this straight in a database, if I understood you correctly.
Henry
Hi Henry,
Yes a lot of the main DB tools these days have such facilities built on or available alongside and data cleansing may help establish consistent formats for things like names as well.
However is you are running a one-off exercise to combine some data sources or simply need to regularly cross check inputs from different sources, then the Monarch tools have a lot to offer.
In your input files how many possible "&" or "And/and" or "in C/O" "name linking" options might you find? (Indeed how many names might you find in a single field?)
Are they controlled in any way or simply random keyboard inputs at the original source point?
If you can identify a limited number of possible linking "texts" (as you are setting out to do) you have an option to make things easier later by making as a first step the conversion of "the linking texts" to a single (obscure?) character. Or lets use "|" for a small example: (Or for your purposes you might simply be able to make everything "&".)
John M. Doe | Jane Wright
That can now be split into 2 name fields using the | .
Then you can see how each name field works with the NSplit function.
It should be possible to build in sanity checks at each stage to identify any records that might need reviewing either to influence the rules you are using or, being practical if you end up with 5 problematic records in 5 million, passing them off for manual attention.
Actually I think it is still reasonably true that if you find a database that is reasonable consistent for more than 80% of its Name records you will be doing well. To find 2 data sources that are equally reasonably consistent on the paired records would be miraculous (under normal circumstances.)
Add addresses into the mix and things usually get even more interesting! But I am drifting off topic slightly.
You can investigate the "linking" text by prototyping and looking for any records where the split seems to fail.
If only one name - fine, nothing to be alarmed about.
Two names but no split - why? Can the formula be adapted or is it an example of "refer to a human"?
Split seems OK but the NSplit struggles with unusual name formats? Alternative "exceptions" processing?, possibility to ignore that record?, refer to a human? Other?
Keeping the logic simple at each step wherever possible is something I find really helps at the prototyping level. By all means choose to make it more complex later if you wish (although I am not sure I would so choose unless I really had to) but simple and visible works well whenever deployed. IMO.
Hope that's useful and perhaps suggests some ideas to you on the basis of knowledge of the data you are dealing with.
Grant
0