How to create a 'real' random value?

kayman
kayman New Altair Community Member
edited November 2024 in Community Q&A
Probably overlooking something but I am struggling to get a random record that is actually changing each time. There is the option to get a random sample value, but this always gives me the same one, and same goes for generating a random value using the rand() function. Also here I get the same random number each time, while I want a new one instead.

I simply want to get a single random record out of a recordset, but it should be a different record each time I call the set. 

Any ideas?
Tagged:

Best Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @kayman,

    You are correct. There is no real random value in computer science... Maybe you have already heard Pseudorandom number.

    “On a completely deterministic machine you can't generate anything you could really call a random sequence of numbers,” says Ward, “because the machine is following the same algorithm to generate them. Typically, that means it starts with a common 'seed' number and then follows a pattern.”
    https://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
    https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/

    For a more day-to-day example, the computer could rely on atmospheric noise or simply use the exact time you press keys on your keyboard as a source of unpredictable data, or entropy. For example, your computer might notice that you pressed a key at exactly 0.23423523 seconds after 2 p.m. 

    That is similar to how we use the date_now() or the process_start timestamp to generate the rand seed for a random number.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="1992"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_macro" compatibility="9.2.001" expanded="true" height="68" name="Generate Macro" width="90" x="112" y="34">
            <list key="function_descriptions">
              <parameter key="seed" value="mod(date_millis(date_now()),10000)"/>
              <parameter key="Pseudorandom_num" value="rand(round(eval(%{seed})))"/>
            </list>
          </operator>
          <operator activated="true" class="generate_data" compatibility="9.2.001" expanded="true" height="68" name="Generate Data" width="90" x="514" y="34">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="1"/>
            <parameter key="number_of_attributes" value="1"/>
            <parameter key="attributes_lower_bound" value="-10.0"/>
            <parameter key="attributes_upper_bound" value="10.0"/>
            <parameter key="gaussian_standard_deviation" value="10.0"/>
            <parameter key="largest_radius" value="10.0"/>
            <parameter key="use_local_random_seed" value="true"/>
            <parameter key="local_random_seed" value="%{seed}"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi @kayman ,
    try to set the random seed of the main process to -1, this will set the seed to something which is connected to the system time. Thus you get different numbers each time you run it.

    Best,
    Martin


Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @kayman,

    You are correct. There is no real random value in computer science... Maybe you have already heard Pseudorandom number.

    “On a completely deterministic machine you can't generate anything you could really call a random sequence of numbers,” says Ward, “because the machine is following the same algorithm to generate them. Typically, that means it starts with a common 'seed' number and then follows a pattern.”
    https://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
    https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/

    For a more day-to-day example, the computer could rely on atmospheric noise or simply use the exact time you press keys on your keyboard as a source of unpredictable data, or entropy. For example, your computer might notice that you pressed a key at exactly 0.23423523 seconds after 2 p.m. 

    That is similar to how we use the date_now() or the process_start timestamp to generate the rand seed for a random number.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="1992"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_macro" compatibility="9.2.001" expanded="true" height="68" name="Generate Macro" width="90" x="112" y="34">
            <list key="function_descriptions">
              <parameter key="seed" value="mod(date_millis(date_now()),10000)"/>
              <parameter key="Pseudorandom_num" value="rand(round(eval(%{seed})))"/>
            </list>
          </operator>
          <operator activated="true" class="generate_data" compatibility="9.2.001" expanded="true" height="68" name="Generate Data" width="90" x="514" y="34">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="1"/>
            <parameter key="number_of_attributes" value="1"/>
            <parameter key="attributes_lower_bound" value="-10.0"/>
            <parameter key="attributes_upper_bound" value="10.0"/>
            <parameter key="gaussian_standard_deviation" value="10.0"/>
            <parameter key="largest_radius" value="10.0"/>
            <parameter key="use_local_random_seed" value="true"/>
            <parameter key="local_random_seed" value="%{seed}"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • kayman
    kayman New Altair Community Member
    Fair enough, but since there is the option to select a random attribute, it should also be theoretically feasible to select a random record, or have a 'real' random sample, isn't it? Now the only option we have is a 'fixed' random number, which seems kind of limited to me.

    I can of course always use the randint python function also, but it feels like a missing option in Rapidminer to me.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    as yy pointed out - there is no "real" random number in CS. The only thing one can do is define the seed as the current timestamp % something or similar. We could of course define a new function which gives you a random random seed, like yy does it. But i am not sure if that makes it better.
    Best,
    Martin
  • David_A
    David_A New Altair Community Member
    Hi,

    setting the seed to something random is probably the best approach to achieve a higher degree of randomness.
    If you, for some reason require a more refined approach, you could also query https://www.random.org/ which API provides "true" randomness based on atmospheric noise.

    Best,
    David
  • kayman
    kayman New Altair Community Member
    Ok, never thought something seemingly simple as for instance 'give me a different random number between 1 and 10 each time I call you' would be so complex to achieve :-) 

    I'll use the python randint then, thanks to all.
  • MartinLiebig
    MartinLiebig
    Altair Employee

    randint gives you also just a pseudo random number.

    Best,
    Martin
  • kayman
    kayman New Altair Community Member
    Hi @mschmitz,  maybe there is some confusion here on terminology, with real random I actually mean 'each time a different random number', whereas rapidminer gives me the same (random) number each time again, so not so random - random after all.
    Whether it is pseudo or not doesn't matter, as long as I would get a different random number each time. 

    I would therefore have expected that the rapidminer rand() function would do something similar as using the date or so to emulate some form of randomness, but it appears to be rather static instead.

    Anyway, I combined the suggestions given by yy and David and use a pseudo random seed value for the shuffle operator. Then I get a different order each time, so I just have to pick the first record and this will be different / random enough for my purposes. 

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi @kayman ,
    try to set the random seed of the main process to -1, this will set the seed to something which is connected to the system time. Thus you get different numbers each time you run it.

    Best,
    Martin