🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Get flickr Data

User: "MBM"
New Altair Community Member
Updated by Jocelyn

Hey all, 

 

I am very new here and I need a proper advice, please. What approach would you recommend to get meta data of flickr photos? I need something like geo data and other data which is provided by flickr to use the common data-mining methods in rapidminer to analyse the data. My problem ist: I know how to use the data mining methods but I don't exactely know how to get the data. Of course there is a flickr API... but I really don't know where to start to think about... souId I start studying how to use a web crawler? Or should I start studying how to use the flickr API? I need an advice where to start to think about... I have got an example rapidminer process to use an API but I don't even understand exactely what it does and that makes my crazy. I want to understand the process...

 

Any help out there?

 

Best wishes 

Marcel

Find more posts tagged with

Sort by:
1 - 10 of 101
    User: "MBM"
    New Altair Community Member
    OP

    cool, thanks a lot! 

     

    so, I can use the methods there, cool! But how do I create an application in rapidminer using the methods? It must be possible to call the methods from flickr?

     

    flickr.photos.getExif seems useful, I guess...

    Retrieves a list of EXIF/TIFF/GPS tags for a given photo. The calling user must have permission to view the photo.
    User: "Vaclav"
    New Altair Community Member
    Accepted Answer

    Hello,

    you can use Get Page or Get Pages operator from webmining extension. Then you need ID of images and API key. If you have that, you can download EXIF in XML format using:

    https://api.flickr.com/services/rest/?method=flickr.photos.getExif&api_key=4471ecc1512fb9ab0f48aa1e1d0eb9ee&photo_id=28367629061&format=rest

     

    This example was taken from:

    https://www.flickr.com/services/api/explore/flickr.photos.getExif

     

    Best wishes,

    Vaclav

    User: "MBM"
    New Altair Community Member
    OP

    ok, this makes sense, thank you! I will try my very best!

    User: "MBM"
    New Altair Community Member
    OP

    soo, me again. I am one step further and I have access to some of the data on flickr. But I now want the XML data in a table. I know I will need xpath but in which operator do I apply xpath? In Read XML or Cut Document or...?... I am a little confused.

    My Process and thinking to get a table so far is: 

    Get Page_JSON to XML_Write Document_Read XML_Data to Documents_

    Process Documents (Cut Document (Remove Document Parts_Extract Information) 

     

    Am I on the right way?

     

     

    Most likely Read XML

     

    Best,

    Martin

    User: "MBM"
    New Altair Community Member
    OP

    Since the xml is like the following

    <?xml version="1.0" encoding="utf-8" ?>
    <rsp stat="ok">
    <comments photo_id="19043683190">
    <comment id="7309457-19043683190-72157655160157292" author="8866365@N08" realname="Sabien">Mooi beeld!</comment>
    <comment id="7309457-19043683190-72157655239336425" author="128586472@N07" realname="">Jolies lignes, belle réalisation !</comment>
    <comment id="7309457-19043683190-72157656031039508" author="34303829@N08"  realname="">mooi gedaan Wouter</comment>
    </comments>
    </rsp>

    in "Read XML" I am now able to get e.g. "realname" and the Text e.g. "Mooi beeld". But I so far it is not very handy because I have to select each and every single tag in the "Import Configuration Wizard". Is there a function in RM to get an automatism that successively selects the relevant tags? Or is it even possible with only XPath?

     

    Regards

    Marcel

    User: "Vaclav"
    New Altair Community Member
    Accepted Answer

    Hello Marcel,

    try this process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.2.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot; ?&gt;&#10;&lt;rsp stat=&quot;ok&quot;&gt;&#10;&lt;comments photo_id=&quot;19043683190&quot;&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655160157292&quot; author=&quot;8866365@N08&quot; realname=&quot;Sabien&quot;&gt;Mooi beeld!&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655239336425&quot; author=&quot;128586472@N07&quot; realname=&quot;&quot;&gt;Jolies lignes, belle réalisation !&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157656031039508&quot; author=&quot;34303829@N08&quot; realname=&quot;&quot;&gt;mooi gedaan Wouter&lt;/comment&gt;&#10;&lt;/comments&gt;&#10;&lt;/rsp&gt;"/>
    </operator>
    <operator activated="true" class="text:cut_document" compatibility="7.2.000" expanded="true" height="68" name="Cut Document" width="90" x="179" y="34">
    <parameter key="query_type" value="Regular Region"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="line" value="^(.*comment.*)"/>
    </list>
    <list key="regular_region_queries">
    <parameter key="text" value="&lt;comment .comment&gt;"/>
    </list>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <process expanded="true">
    <operator activated="false" class="text:filter_tokens_by_content" compatibility="7.2.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="179" y="34">
    <parameter key="string" value="comment id"/>
    <parameter key="regular_expression" value="commnet id.*"/>
    </operator>
    <connect from_port="segment" to_port="document 1"/>
    <portSpacing port="source_segment" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="loop_collection" compatibility="7.2.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="text:extract_information" compatibility="7.2.000" expanded="true" height="68" name="Extract Information" width="90" x="45" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="id" value="id=[&quot;']([^'&quot;]+)"/>
    <parameter key="author" value="author=[&quot;']([^'&quot;]+)"/>
    <parameter key="realname" value="realname=[&quot;']([^'&quot;]+)"/>
    <parameter key="comment" value="&gt;(.+?)&lt;/comment&gt;"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries">
    <parameter key="id" value="/*/comment/comments/@id"/&gt;
    </list>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="179" y="34">
    <parameter key="text_attribute" value="text"/>
    </operator>
    <connect from_port="single" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_op="Documents to Data (2)" to_port="documents 1"/>
    <connect from_op="Documents to Data (2)" from_port="example set" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
    <connect from_op="Create Document" from_port="output" to_op="Cut Document" to_port="document"/>
    <connect from_op="Cut Document" from_port="documents" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

    Best wishes,

    Vaclav 

    User: "MBM"
    New Altair Community Member
    OP

    wow, thank you. That makes the data collection so much easier. I can use this for any similar cases!

    You made my day!

     

    best wishes

    marcel

    User: "MBM"
    New Altair Community Member
    OP

    @Vaclav thank you for everything. 

     

    It almost works. I manually got flickr specific frob, auth_token and api_sig to get the right URL for "Get Pages" BUT in RapidMiner there is just this:

     

    <?xml version="1.0" encoding="utf-8" ?>
    <rsp stat="fail">
    <err code="95" msg="SSL is required" />
    </rsp>

    How can I make ssl queries in RapidMiner?

     

    Best wishes

     

    marcel