Problem reading SPSS SAV files. I have read the previous posts but am no wiser

LaurieMoseley
LaurieMoseley New Altair Community Member
edited November 5 in Community Q&A

I have just downloaded version 9.8.

 As my first task, I tried to read an SPSS SAV file. This was the data set from the UK Annual Psychiatric Morbidity survey, provided by the UK Data Service – which has a good reputation for its data curation standards. All I did was to place the READ SPSS operator on the canvas, link it to the output node (on the right of the screen) and click the Run button. Yes, that was just one operator.

 

This screenshot shows RapidMiner's response.

 

 

The list of errors is too long to appear in the screen shot. I have therefore copied that list and have pasted it below.

 

  • Exception: java.lang.IndexOutOfBoundsException
  • Message: null
  • Stack trace:
  • java.io.FileInputStream.readBytes(Native Method)
  • java.io.FileInputStream.read(FileInputStream.java:255)
  • com.rapidminer.operator.io.BytewiseExampleSource.read(BytewiseExampleSource.java:131)
  • com.rapidminer.operator.io.BytewiseExampleSource.read(BytewiseExampleSource.java:123)
  • com.rapidminer.extension.file.connectors.operator.io.SPSSExampleSource.readStream(SPSSExampleSource.java:332)
  • com.rapidminer.operator.io.BytewiseExampleSource.createExampleSet(BytewiseExampleSource.java:90)
  • com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:53)
  • com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:32)
  • com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:272)
  • com.rapidminer.operator.Operator.execute(Operator.java:1022)
  • com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
  • com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:806)
  • com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:801)
  • java.security.AccessController.doPrivileged(Native Method)
  • com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:801)
  • com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:423)
  • com.rapidminer.operator.Operator.execute(Operator.java:1022)
  • com.rapidminer.Process.executeRoot(Process.java:1464)
  • com.rapidminer.Process.lambda$executeRootInPool$5(Process.java:1443)
  • com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext$AdaptedCallable.exec(AbstractConcurrencyContext.java:362)
  • java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
  • java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
  • java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
  • java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

 

I am using RapidMiner on an HP Compaq Pro 63000, with 16GB of RAM and running 64-bit Windows 10.

 I would add that, although RapidMiner has in general worked well for me on most data sets in the past, there has always been a question mark over its performance specifically with SPSS data files. In general, reading CSV or Excel has been fine.

 

It has been my experience that it does not appear to recognize the metadata (such as Variable and Value Labels or sometimes Missing Values). That means that when one wishes to undertake some operations (e.g. Set Role, or producing pivot tables) which require the use of attribute names or values, RapidMiner does not appear to know the attribute names or their values. It certainly does not offer them for selection. With large studies, that places a cognitive burden on the user. Who can recall all the variable names and values out of thousands?

 

I have also tried Ingo's earlier advice by using Read SPSS operator immediately linked to a Store operator. That, though, merely produced what appeared to be the same error.

 

Given that SPSS is so widely used in academia and business, that could mean that RapidMiner is missing a trick. Perhaps I have done something wrong. If so, I would be very grateful if someone could point me in the right direction. All I want is for the system to work as it does in the tutorials!

 

With thanks in advance

 

Laurence Moseley

17th October 2020

 


Tagged:

Answers