🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

New Extension for Applied Onomastics (name recognition) on GitHub + help needed

User: "NamSor"
New Altair Community Member
Updated by Jocelyn
Hi,

Last month we've prototyped RapidMiner integration with NamSor GendRE API, to recognize the gender of names
http://namesorts.com/2014/04/23/rapidminer-to-enrich-gender-data/
using  'Enrich Data by Webservice'.

We've started building a custom extension to offer more functionalities, but we're running into problems.
https://github.com/namsor/rapidminer-onomastics-extension

1) The firstName in the CSV output doesn't correspond to the input
2) The REAL value shows a rounded value instead of full precision (don't look at the value it's random generated)
3) We had to create a 'DummyOperator' with 'name generate_extract' otherwise RM complains that the documentation is missing

Otherwise, the integration seems to work wth RM5.3.015, the operator appears under /Onomastics/Name2Gender

Any help welcome!
Thanks,
Elian

Input file:
firstName;lastName;countryIso2
Blas;PEREZ+HENRIQUEZ;
A.+Craig;COPETAS;
Abdel;AISSOU;
Abderrahman;BEDDI;
Achmad+Danny;GAZALI;
Ada;COLAU;
Adam;GREEN;
Adam+S.;POSEN;
Adeline;BRAESCU+KERLAN;
Aditya;GARG;
Adnan;BALI;
Adnane;EL+FASSI;
Adriaan;SMIT;
Adrian;MCGINN;
Adrián;MICHEL+ESPINO;
Adriana;VERDIER;
Adrien;REGNIER+LAURENT;fr
Adrien;SURU;
Илья;Ковальчук;ru


What we get in the output (genderScale is a random number) :

"firstName";"lastName";"countryIso2";"genderScale";"gender"
"Blas";"PEREZ+HENRIQUEZ";;0.0;"Male"
"A.+Craig";"COPETAS";;1.0;"Female"
"Abdel";"AISSOU";;2.0;"Unknown"
"Blas";"BEDDI";;0.0;"Male"
"A.+Craig";"GAZALI";;1.0;"Female"
"Blas";"COLAU";;0.0;"Male"
"Abdel";"GREEN";;2.0;"Unknown"
"Blas";"POSEN";;0.0;"Male"
"Blas";"BRAESCU+KERLAN";;0.0;"Male"
"Blas";"GARG";;0.0;"Male"
"Abdel";"BALI";;2.0;"Unknown"
"A.+Craig";"EL+FASSI";;1.0;"Female"
"Blas";"SMIT";;0.0;"Male"
"A.+Craig";"MCGINN";;1.0;"Female"
"Abdel";"MICHEL+ESPINO";;2.0;"Unknown"
"Abdel";"VERDIER";;2.0;"Unknown"
"A.+Craig";"REGNIER+LAURENT";"fr";1.0;"Female"
"A.+Craig";"SURU";;1.0;"Female"
"Blas";"Ковальчук";"ru";0.0;"Male"

Find more posts tagged with

Sort by:
1 - 6 of 61
    User: "Marco_Boeck"
    New Altair Community Member
    Hi,

    cool stuff 8)

    1) I don't quite get the problem. What CSV output?
    2) RapidMiner is by default rounding to 3 fraction digits when displaying data. You can change the default setting in the preferences under "General" -> "rapidminer.general.fractiondigits.numbers". When calculating, the actual numbers are used.
    3) Not quite sure what that is about, are you getting this warning in the console also when removing your extension? I don't think it has to do anything with it.

    Regards,
    Marco
    User: "NamSor"
    New Altair Community Member
    OP
    Hi Marco! Thanks for helping out.

    I've created a simple process loading data from an Excel file with

    >firstName;lastName;countryIso2
    >Blas;PEREZ+HENRIQUEZ;
    >A.+Craig;COPETAS;
    >Abdel;AISSOU;

    Then I've connected this Import Excel operator with my custom Extension operator Name2Gender, and connected the output to a CSV file. Unfortunately, the output of my Extension operator seems completely mixed up, with the same firstName being repeated several times, incorrect numeric values, etc.

    I think the problem comes from the way I pass parameters in and out in the doWork method


    @Override
    public void doWork() throws OperatorException {

    ExampleSet exampleSet = inputSet.getData();
    Attributes attributes = exampleSet.getAttributes();
    Attribute fnAttribute = attributes.get(ATTRIBUTE_FN);
    Attribute lnAttribute = attributes.get(ATTRIBUTE_LN);
    Attribute iso2Attribute = attributes.get(ATTRIBUTE_ISO2);

    String mashapeAPIKey = getParameterAsString(MASHAPE_API_KEY);
    String defaultISO2 = getParameterAsString(DEFAULT_COUNTRY_ISO2);
    double threshold = getParameterAsDouble(ATTRIBUTE_THRESHOLD);

    Attribute genderScaleAttribute = AttributeFactory.createAttribute(
    ATTRIBUTE_GENDERSCALE, Ontology.REAL);
    genderScaleAttribute.setTableIndex(fnAttribute.getTableIndex());
    attributes.addRegular(genderScaleAttribute);

    Attribute genderAttribute = AttributeFactory.createAttribute(
    ATTRIBUTE_GENDER, Ontology.STRING);
    genderAttribute.setTableIndex(fnAttribute.getTableIndex());
    attributes.addRegular(genderAttribute);

    for (Example example : exampleSet) {
    String firstName = example.getValueAsString(fnAttribute);
    String lastName = example.getValueAsString(lnAttribute);
    String iso2 = example.getValueAsString(iso2Attribute);
    if (iso2 != null && iso2.trim().length() == 2) {
    // real value
    } else if (defaultISO2 != null && defaultISO2.trim().length() == 2) {
    iso2 = defaultISO2.trim();
    } else {
    // invalid value, set to null
    iso2 = null;
    }

    double genderScale = 0d;
    if (MOCKUP) {
    genderScale = RND.nextDouble() * 2 - 1;
    } else {
    // API stuff goes here
    }
    String gender = "Unknown";
    if (genderScale > threshold) {
    gender = "Female";
    } else if (genderScale < -threshold) {
    gender = "Male";
    }
    example.setValue(genderScaleAttribute, genderScale);
    example.setValue(genderAttribute, gender);
    }
    outputSet.deliver(exampleSet);
    }

    Any idea?
    Thx,
    Elian
    User: "Marco_Boeck"
    New Altair Community Member
    Hi,

    the call

    genderScaleAttribute.setTableIndex(fnAttribute.getTableIndex());
    seems dangerous. Generally speaking, you can only append new attribute columns on the right. Does removing said line fix your problem?

    Regards,
    Marco
    User: "NamSor"
    New Altair Community Member
    OP
    Hi Marco,

    Without this call, I get a ArrayIndexOutOfBoundsException. I took this method from "How-to-Extend-RapidMiner-5" documentation. Is there an updated document?

    Thx in advance for your help,
    Elian

    SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
    java.lang.ArrayIndexOutOfBoundsException: -1
            at com.rapidminer.example.table.DoubleArrayDataRow.set(DoubleArrayDataRo
    w.java:61)
            at com.rapidminer.example.table.AbstractAttribute.setValue(AbstractAttri
    bute.java:184)
            at com.rapidminer.example.table.DataRow.set(DataRow.java:85)
            at com.rapidminer.example.Example.setValue(Example.java:140)
            at com.namsor.api.rapidminer.Name2GenderOperator.doWork(Name2GenderOpera
    tor.java:160)
            at com.rapidminer.operator.Operator.execute(Operator.java:866)
            at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUn
    itExecutor.java:51)
            at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)

            at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
            at com.rapidminer.operator.Operator.execute(Operator.java:866)
    User: "Marco_Boeck"
    New Altair Community Member
    Hi,

    the document will be updated, however I cannot name any date as of yet.
    Please use these calls to add new attributes to an existing ExampleSet.

    exampleSet.getExampleTable().addAttribute(newAttribute);
    exampleSet.getAttributes().addRegular(newAttribute);
    Regards,
    Marco
    User: "NamSor"
    New Altair Community Member
    OP
    Thanks a lot Marco, that worked! E.