DOI QR코드

DOI QR Code

Data-Compression-Based Resource Management in Cloud Computing for Biology and Medicine

  • Zhu, Changming (College of Information Engineering, Shanghai Maritime University)
  • Received : 2015.06.14
  • Accepted : 2016.02.25
  • Published : 2016.03.30

Abstract

With the application and development of biomedical techniques such as next-generation sequencing, mass spectrometry, and medical imaging, the amount of biomedical data have been growing explosively. In terms of processing such data, we face the problems surrounding big data, highly intensive computation, and high dimensionality data. Fortunately, cloud computing represents significant advantages of resource allocation, data storage, computation, and sharing and offers a solution to solve big data problems of biomedical research. In order to improve the efficiency of resource management in cloud computing, this paper proposes a clustering method and adopts Radial Basis Function in order to compress comprehensive data sets found in biology and medicine in high quality, and stores these data with resource management in cloud computing. Experiments have validated that with such a data-compression-based resource management in cloud computing, one can store large data sets from biology and medicine in fewer capacities. Furthermore, with reverse operation of the Radial Basis Function, these compressed data can be reconstructed with high accuracy.

Keywords

References

  1. J. Qin, Y. Li, Z. Cai, S. Li, J. Zhu, F. Zhang, Y. Peng, S. Liang, W. Zhang, Y. Guan, et al., "A metagenome-wide association study of gut microbiota in type 2 diabetes," Nature, vol. 490, no. 7418, pp. 55-60, 2012. https://doi.org/10.1038/nature11450
  2. M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud computing and the DNA data race," Nature Biotechnology, vol. 28, no. 7, pp. 691-693, 2010. https://doi.org/10.1038/nbt0710-691
  3. "Gathering clouds and a sequencing storm: why cloud computing could broaden community access to next-generation sequencing," Nature Biotechnology, vol. 28, no. 1, 2010. http://dx.doi.org/10.1038/nbt0110-1.
  4. A. Rosenthal, P. Mork, M. H. Li, J. Stanford, D. Koester, and P. Reynolds, "Cloud computing: a new business paradigm for biomedical information sharing," Journal of Biomedical Informatics, vol. 43, no. 2, pp. 342-353, 2010. https://doi.org/10.1016/j.jbi.2009.08.014
  5. E. Pennisi, "Human genome 10th anniversary. Will computers crash genomics?," Science, vol. 11, no. 6018, pp. 666-668, 2011.
  6. A. Darling, L. Carey, and W. C. Feng, "The design, implementation, and evaluation of mpiBLAST," in Proceedings of ClusterWorld Conference & Expo, San Jose, CA, 2003.
  7. E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee, and G. P. Nolan, "Computational solutions to large-scale data management and analysis," Nature Reviews Genetics, vol. 11, no. 9, pp. 647-657, 2010. https://doi.org/10.1038/nrg2857
  8. D. P. Wall, P. Kudtarkar, V. A. Fusaro, R. Pivovarov, P. Patil, and P. J. Tonellato, "Cloud computing for comparative genomics," BMC Bioinformatics, vol. 11, pp. 1-12, 2010. https://doi.org/10.1186/1471-2105-11-1
  9. L. D. Stein, "The case for cloud computing in genome informatics," Genome Biology, vol. 11, pp. 1-7, 2010.
  10. J. T. Dudley, Y. Pouliot, R. Chen, A. A. Morgan, and A. J. Butte, "Translational bioinformatics in the cloud: an affordable alternative," Genome Medicine, vol. 2, pp. 1-6, 2010. https://doi.org/10.1186/gm122
  11. J. Wilkening, A. Wilke, N. Desai, and F. Meyer, "Using clouds for metagenomics: a case study," in Proceedings of IEEE International Conference on Cluster Computing & Workshops, New Orleans, LA, 2009, pp. 1-6.
  12. National Institute of Standards and Technology, "The NIST definition of cloud computing," Sep. 2011;http://dx.doi.org/10.6028/NIST.SP.800-145.
  13. S. Grumbach and F. Tahi, "Compression of DNA sequences," in Proceedings of Data Compression Conference (DCC'93), Snowbird, UT, 1993, pp. 340-350.
  14. S. Grumbach and F. Tahi, "A new challenge for compression algorithms: genetic sequences," Information Processing & Management, vol. 30, no. 6, pp. 875-886, 1994. https://doi.org/10.1016/0306-4573(94)90014-0
  15. X. Chen, S. Kwong, and M. Li, "A compression algorithm for DNA sequences and its applications in genome comparison," Genome Informatics, vol. 10, pp. 51-61, 1999.
  16. T. Matsumoto, K. Sadakane, and H. Imai, "Biological sequence compression algorithms," Genome Informatics, vol. 11, pp. 43- 52, 2000.
  17. B. Behzadi and F. L. Fessant, "DNA compression challenge revisited: a dynamic programming approach," in Combinatorial Pattern Matching, Heidelberg: Springer, pp. 190-200, 2005.
  18. K. G. Srinivasa, M. Jagadish, K. R. Venugopal, and L. M. Patnaik, "Efficient compression of nonrepetitive DNA sequences using dynamic programming," in Proceedings of International Conference on Advanced Computing & Communications, Surathkal, India, 2006, pp. 569-574.
  19. G. Korodi and I. Tabus, "An efficient normalized maximum likelihood algorithm for DNA sequence compression," ACM Transactions on Information Systems, vol. 23, no. 1, pp. 3-34, 2005. https://doi.org/10.1145/1055709.1055711
  20. W. H. Day and H. Edelsbrunner, "Efficient algorithms for agglomerative hierarchical clustering methods," Journal of Classification, vol. 1, no. 1, pp. 7-24, 1984. https://doi.org/10.1007/BF01890115
  21. J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a k-means clustering algorithm," Journal of the Royal Statistical Society Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979.
  22. D. Gao and J. Li, "Kernel fisher discriminants and kernel nearest neighbor classifiers: a comparative study for largescale learning problems," in Proceedings of International Joint Conference on Neural Networks, Vancouver, BC, 2006, pp. 1333-1338.