Friday, 23 January 2009

The future IS super computing and the cloud

Web based technologies, whether called Cloud, portals or virtual research environments should provide us with fantastic opportunities for growth and development. My earlier post highlighted something of what lays ahead of us, but what are the opportunities. My own recent interactions with even the simplest technologies have deepened my understanding of how people can benefit from web based communication. As a statistical researcher I see a future where standard statistical analysis programs (e.g. SPSS, WSTATA) are embedded into portals as standard. This would be a big step forward for a number of reasons. Firstly, my present pet hate with SPSS (for example) is that I have to reinstall the package at least once a year due to upgrades or new lisence issues. In the portal world, this ceases to be needed. Providing the university has paid the annual fee to SPSS, and providing that I am a registered university employee or student, I would be able to access its functionality from any web browser. All upgrades or lisence updates would be handled centrally by university computing services. A much greater benefit to users and how they innovate would be the ability of the stats packages to embrace the 'Cloud' to handle GRID enabled multiprocessor computation.

Datasets (i.e the information that we collect on any matter we choose) have become larger often running into terescale dimensions. Our ability to conduct useful estimations on data of this size is greatly deminished. It is no longer uncommon to hear of very large corporations having difficulty processing data for this reason; limiting their ability to take advantage of the latest estimation processes, slowing innovation. A scientific example of this scale of data generation is provided by the Cern laboratory that will generate terabytes of data per experient. On a singe computer it is practically impossible to run models or run tests that would 'sift' this data to enhance knowledge. The cloud provides us a 'super computing' platform to reduce this issue as all data storage and computations take place away from the users machine, harnessing many computers simultaneously, dramatically reducing computational time for many user groups. This is multiprocessor computation. The issue however, is that many organisations may not allow their data to be stored and processed outside of their own IT networks due to data security risks; something we hear rumblings of already...for academic researchers this is likely to much less of an issue.

