New Extension for Applied Onomastics (name recognition) on GitHub + help needed
Hi,
Last month we've prototyped RapidMiner integration with NamSor GendRE API, to recognize the gender of names
http://namesorts.com/2014/04/23/rapidminer-to-enrich-gender-data/
using 'Enrich Data by Webservice'.
We've started building a custom extension to offer more functionalities, but we're running into problems.
https://github.com/namsor/rapidminer-onomastics-extension
1) The firstName in the CSV output doesn't correspond to the input
2) The REAL value shows a rounded value instead of full precision (don't look at the value it's random generated)
3) We had to create a 'DummyOperator' with 'name generate_extract' otherwise RM complains that the documentation is missing
Otherwise, the integration seems to work wth RM5.3.015, the operator appears under /Onomastics/Name2Gender
Any help welcome!
Thanks,
Elian
Input file:
firstName;lastName;countryIso2
Blas;PEREZ+HENRIQUEZ;
A.+Craig;COPETAS;
Abdel;AISSOU;
Abderrahman;BEDDI;
Achmad+Danny;GAZALI;
Ada;COLAU;
Adam;GREEN;
Adam+S.;POSEN;
Adeline;BRAESCU+KERLAN;
Aditya;GARG;
Adnan;BALI;
Adnane;EL+FASSI;
Adriaan;SMIT;
Adrian;MCGINN;
Adrián;MICHEL+ESPINO;
Adriana;VERDIER;
Adrien;REGNIER+LAURENT;fr
Adrien;SURU;
Илья;Ковальчук;ru
What we get in the output (genderScale is a random number) :
"firstName";"lastName";"countryIso2";"genderScale";"gender"
"Blas";"PEREZ+HENRIQUEZ";;0.0;"Male"
"A.+Craig";"COPETAS";;1.0;"Female"
"Abdel";"AISSOU";;2.0;"Unknown"
"Blas";"BEDDI";;0.0;"Male"
"A.+Craig";"GAZALI";;1.0;"Female"
"Blas";"COLAU";;0.0;"Male"
"Abdel";"GREEN";;2.0;"Unknown"
"Blas";"POSEN";;0.0;"Male"
"Blas";"BRAESCU+KERLAN";;0.0;"Male"
"Blas";"GARG";;0.0;"Male"
"Abdel";"BALI";;2.0;"Unknown"
"A.+Craig";"EL+FASSI";;1.0;"Female"
"Blas";"SMIT";;0.0;"Male"
"A.+Craig";"MCGINN";;1.0;"Female"
"Abdel";"MICHEL+ESPINO";;2.0;"Unknown"
"Abdel";"VERDIER";;2.0;"Unknown"
"A.+Craig";"REGNIER+LAURENT";"fr";1.0;"Female"
"A.+Craig";"SURU";;1.0;"Female"
"Blas";"Ковальчук";"ru";0.0;"Male"
Last month we've prototyped RapidMiner integration with NamSor GendRE API, to recognize the gender of names
http://namesorts.com/2014/04/23/rapidminer-to-enrich-gender-data/
using 'Enrich Data by Webservice'.
We've started building a custom extension to offer more functionalities, but we're running into problems.
https://github.com/namsor/rapidminer-onomastics-extension
1) The firstName in the CSV output doesn't correspond to the input
2) The REAL value shows a rounded value instead of full precision (don't look at the value it's random generated)
3) We had to create a 'DummyOperator' with 'name generate_extract' otherwise RM complains that the documentation is missing
Otherwise, the integration seems to work wth RM5.3.015, the operator appears under /Onomastics/Name2Gender
Any help welcome!
Thanks,
Elian
Input file:
firstName;lastName;countryIso2
Blas;PEREZ+HENRIQUEZ;
A.+Craig;COPETAS;
Abdel;AISSOU;
Abderrahman;BEDDI;
Achmad+Danny;GAZALI;
Ada;COLAU;
Adam;GREEN;
Adam+S.;POSEN;
Adeline;BRAESCU+KERLAN;
Aditya;GARG;
Adnan;BALI;
Adnane;EL+FASSI;
Adriaan;SMIT;
Adrian;MCGINN;
Adrián;MICHEL+ESPINO;
Adriana;VERDIER;
Adrien;REGNIER+LAURENT;fr
Adrien;SURU;
Илья;Ковальчук;ru
What we get in the output (genderScale is a random number) :
"firstName";"lastName";"countryIso2";"genderScale";"gender"
"Blas";"PEREZ+HENRIQUEZ";;0.0;"Male"
"A.+Craig";"COPETAS";;1.0;"Female"
"Abdel";"AISSOU";;2.0;"Unknown"
"Blas";"BEDDI";;0.0;"Male"
"A.+Craig";"GAZALI";;1.0;"Female"
"Blas";"COLAU";;0.0;"Male"
"Abdel";"GREEN";;2.0;"Unknown"
"Blas";"POSEN";;0.0;"Male"
"Blas";"BRAESCU+KERLAN";;0.0;"Male"
"Blas";"GARG";;0.0;"Male"
"Abdel";"BALI";;2.0;"Unknown"
"A.+Craig";"EL+FASSI";;1.0;"Female"
"Blas";"SMIT";;0.0;"Male"
"A.+Craig";"MCGINN";;1.0;"Female"
"Abdel";"MICHEL+ESPINO";;2.0;"Unknown"
"Abdel";"VERDIER";;2.0;"Unknown"
"A.+Craig";"REGNIER+LAURENT";"fr";1.0;"Female"
"A.+Craig";"SURU";;1.0;"Female"
"Blas";"Ковальчук";"ru";0.0;"Male"