Rajeswaran, Karthik Bharadhwaj. Lossless compression of SKA data sets. MSc Dissertation. Department of Electrical Engineering, University of Cape Town, 2013.
With the size of astronomical data archives continuing to increase at an enormous rate, the providers and end users of astronomical data sets will benefit from effective data compression techniques. This dissertation explores different data compression techniques and aims to find an optimal compression algorithm to compress astronomical data obtained by the SKA, which are new and unique in the field of radio astronomy. It was required that the compressed data sets should be lossless and that they should be compressed while the data is being accessed. The project was carried out in conjunction with the SKA South Africa office.
Data compression reduces the time taken and the bandwidth used when transferring files, and can also reduce the costs involved with data storage. This is especially applicable when radio telescopes are located a long distance away from the centres where the astronomical data is processed and analysed, which is the case with the SKA project.
The SKA use the Hierarchical Data Format (HDF5) to store the data collected from the radio telescopes, with the data used in this study ranging from 29MB to 9GB in size. The compression techniques investigated in this study include SZIP, GZIP, the LZF filter, LZ4 and the Fully Adaptive Prediction Error Coder (FAPEC).
It was found that the LZ4 and PEC provided the best compression ratios and were the most time and memory efficent algorithms. A program was developed using the LZ4 algorithm which was used to compress the data sets while they were being accessed from another machine, thus simulating the environment which is used by the SKA.
The dissertation concludes that the PEC and LZ4 are the optimal compression algorithms for the SKA data sets at this point in time, and presents suggestions for future work and discusses improvements that could be made.