nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

De-identification of medical text

DocMusher

Hi,

Planning a scientific paper on the use of RM in the medical field. Therefore I would like to implement a recent github project (Python) in a RM process. https://github.com/vmenger/deduce/blob/master/setup.py

The RM community member who is able to integrate the py code in a RM process where data consist of a column with text examples, becomes a co-author of the paper.

Thanks

Sven

Find more posts tagged with

AI Studio

Healthcare-BioMed-Pharma

Text Mining + NLP

Accepted answers

MartinLiebig

Sven,

attached is a process using the function to "deidentify" a attribute named text. Tell me if you need more .

Best,

Martin

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
        <parameter key="text" value="Dit is stukje tekst met daarin de naam Jan Jansen. De patient J. Jansen (e: j.jnsen@email.com, t: 06-12345678) is 64 jaar"/>
      </operator>
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="187">
        <parameter key="text" value="Another String with an email msch@rm.com"/>
      </operator>
      <operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="103" name="Documents to Data" width="90" x="447" y="85">
        <parameter key="text_attribute" value="text"/>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="782" y="85">
        <parameter key="script" value="import pandas&#10;from deduce import deduce&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def deduce_string(x):&#10;    annotated = deduce.annotate_text(x, patient_first_names=&quot;Jan&quot;, patient_surname=&quot;Jansen&quot;)&#10;    deidentified = deduce.deidentify_annotations(annotated)&#10;    return deidentified&#10;&#10;def rm_main(data):&#10;    attribute = &quot;text&quot;&#10;    data[&quot;deident&quot;] = data[&quot;text&quot;].apply(deduce_string)&#10;  &#10;    # connect 2 output ports to see the results&#10;    return data"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data" to_port="documents 2"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

All comments

MartinLiebig

Sven,

do i see it correctly that you just want to "un-identify" a single text attribute?

~Martin

DocMusher

Martin,

In fact, the code does more and is unique for the Dutch language. With the explosive growth of medical data, the majority as text (e.a. discharge notes), any preprocessing require deletion of Protected Health Information.

Thanks

Sven

Deduce: de-identification method for Dutch medical text

This project contains the code for DEDUCE: de-identification method for Dutch medical text as described in Menger et al (2017). De-identification of medical text is needed for using text data for analysis, to comply with legal requirements and to protect the privacy of patients. Our pattern matching based method removes Protected Health Information (PHI) in the following categories:

Person names, including initials
Geographical locations smaller than a country
Names of institutions that are related to patient treatment
Dates
Ages
Patient numbers
Telephone numbers
E-mail addresses and URLs

The details of the development and workings of the method, and its validation can be found in:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

MartinLiebig

Sven,

attached is a process using the function to "deidentify" a attribute named text. Tell me if you need more .

Best,

Martin

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
        <parameter key="text" value="Dit is stukje tekst met daarin de naam Jan Jansen. De patient J. Jansen (e: j.jnsen@email.com, t: 06-12345678) is 64 jaar"/>
      </operator>
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="187">
        <parameter key="text" value="Another String with an email msch@rm.com"/>
      </operator>
      <operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="103" name="Documents to Data" width="90" x="447" y="85">
        <parameter key="text_attribute" value="text"/>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="782" y="85">
        <parameter key="script" value="import pandas&#10;from deduce import deduce&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def deduce_string(x):&#10;    annotated = deduce.annotate_text(x, patient_first_names=&quot;Jan&quot;, patient_surname=&quot;Jansen&quot;)&#10;    deidentified = deduce.deidentify_annotations(annotated)&#10;    return deidentified&#10;&#10;def rm_main(data):&#10;    attribute = &quot;text&quot;&#10;    data[&quot;deident&quot;] = data[&quot;text&quot;].apply(deduce_string)&#10;  &#10;    # connect 2 output ports to see the results&#10;    return data"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data" to_port="documents 2"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>