how to parse and move vendor contact data into correct columns when some fields are missing
Dear Rapidminer Support,
I have vendor data which has missing cells because it was taken from a javascript parsed records page (knockout.js)?
The spreadsheet has multiple vendors for each manufacturers product. The issue was not discovered until after the website was taken offline.
The order of the data will always be the same columnwise (an eight column contact detail per vendor format):
vendor_name_1, vendor_address_1, vendor_phone_1, vendor_fax_1, vendor_contact_mobile_1, vendor_contact_email_1, etc.
When there is more than one vendor for that product (almost all most do), there is another repeat of the columns in the same order left to right:
vendor_name_2, vendor_address_2, vendor_phone_2, vendor_fax_2, vendor_contact_mobile_2, vendor_contact_email_2, etc.
At this point the sets of columns repeat as long as there are more vendors for the product on that row.
A "good" row will have all of the available data in the correct column:
Motion Distributors; 3231 Apex Drive; Dulles, Ohio 45321; (321) 542-6422(p); (321) 542-6428(f); (321) 542-6680(m); alan@motiondist.com; etc. etc.
A "bad" row will have (one or more) missing items for at least one vendor on the row, which of course effects everything to the right of that missing cell, so all of the data becomes "shifted".
Since some of the data in the cells are missing, the my problem is getting the data in each row back to the correct cell.
For example, if the vendor_fax number is missing, all of the cells to the right of that missing cell do not go into the correct column and are shifted.
To make things worse, because there are multiple vendors for the same product, the more missing cells per row, the more shifting occurs on that row.
Some manufacturer products have more than a dozen vendors, so would it be better to read the entire row into memory first?
Is there a way to fix this since each vendor data set will always be the same arrangement of eight columns, and I just need to know how many sets of eight columns to prepare before parsing the row?
Thanks in advance...