Development of Collaborative Environment for Community-driven Scientific Data Curation

커뮤니티 주도적 과학 데이터 큐레이션 협업 환경의 개발

  • Received : 2017.06.21
  • Accepted : 2017.07.26
  • Published : 2017.09.28


The importance of data curation is increasingly recognized as the need of data reuse drastically grows. Due to recent data explosion, scientists invest almost 90% of their efforts in the retrieval and collection of data needed to their study. In this paper, we deal with the development and application of a collaborative environment for community-driven data curation which is essential to enhance scientific data reusability and citability. The collaborative scientific data curation environment focuses on the cross-linking between data (or data collections) and their associated literatures to capture and organize inter-relations among research results in a specific domain. Also, plenty of contextual information is provided as metadata in order to support users in understanding data. The cross-linking has been realized by using DOI system to guarantee global accessibility to data and their relationships to literatures. The curation environment has been adopted to build a community-driven curated DB by a globally well-known intrinsically-disorderd protein research group. The curated DB will drastically reduce researchers' efforts to retrieve and collect the data required for scientific discovery.


Data Reusability;Community-Driven Data Curation;Collaborative Curation Environment;Data-Literature Interlinking;Context Information


Supported by : 한국과학기술정보연구원(KISTI)


  1. B. Howe and T. Lewis, "Enabling Collaborative Research Data Management with SQLShare, 2012,
  2. I. Faniel, D. Minor, and C. L. Palm, "Putting Research Data into Context: Scholarly, Professional, and Educational Approaches to Curating Data for Reuse," ASIST 2014.
  3. I. Faniel, E. Yakel, K. Fear, and E. Kansa, "A Context-driven Approach to Data Curation for Reuse," International Digital Curation Conference, Amsterdam, February 22, 2016.
  4. I. Faniel, E. Kansa, S. W. Kansa, J. Barrera-Gomez, and E. Yakel, "The Challenges of Digging Data: A Study of Context in Archaeological Data Reuse," JCDL 2013, pp.295-304.
  7. M. E. Cusick, "Literature-curated protein interaction datasets," Nat Methods, Vol.6, No.1, pp.39-465, 2009.
  8. D. S. Kwon, S. Kim, S. Y. Shin, Andrew Chatr-aryamontri, and W. John Wilbur, "Assisting manual literature curation for protein-protein interactions using BioQRator," Database, 2014.
  9. D. G. Jamieson, M. Germer, F. Sarafraz, G. Nenadic, and D. L. Robertson, "Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database," Database, 2012.
  10. M. S. Mayernik, J. Phillips, and E. Nienhouse, "Linking Publications and Data: Challenges, Trends, and Opportunities," D-Lib Magazine, Vol.22, No.5/6, 2016(11).
  11. M. Hoogerwerf, M. Losch, J. Schirrwagen, S. Callaghan, P. Manghi, K. Iatropoulou, D. Keramida, and N. Rettberg, "Linking Data and Publications: Towards a Cross-Disciplinary Approach," The International Journal of Digital Curation, Vol.8, No.1, 2013.
  12. B. Lawrence, C. Jones, B. Mathews, S. Palmer, and S. Callaghan, "Citation and Peer Review of Data: Moving Towards Formal Data Publication," The International Journal of Digital Curation, Vol.6, No.2, 2011.
  13. H. M. Berman, J, Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank," Nucleic Acids Research, Vol.28, No.1, pp.235-242, 2000.
  15. H. Lee, K. H. Mok, R. Muhandiram, K. H. Park, J. E. Suk, D. H. Kim, J. Chang, Y. C. Sung, K. Y. Choi, and K. H. Han, "Local Structural Elements in the Mostly Unstructured Transcriptional Activation Domain of Human p53," The Journal of Biological Chemistry, Vol.275, No.38, pp.29426-294323, 2000.
  17. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, K. Blomberg, J. W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, C. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. G. 't Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship," Scientific Data 2016.
  18. Life Science Solutions, "Automated vs manual literature curation: extracting more information from scientific literature," Elsevier, 2014.