A Semiotics Framework for Analyzing Data Provenance Research

  • Ram, Sudha (Department of MIS Eller College of Management 430J McClelland Hall University of Arizona) ;
  • Liu, Jun (Department of MIS Eller College of Management 430J McClelland Hall University of Arizona)
  • Published : 2008.09.30


Data provenance is the background knowledge that enables a piece of data to be interpreted and used correctly within context. The importance of tracking provenance is widely recognized, as witnessed by significant research in various areas including e-science, homeland security, and data warehousing and business intelligence. In order to further advance the research on data provenance, however, one must first understand the research that has been conducted to date and identify specific topics that merit further investigation. In this work, we develop a framework based on semiotics theory to assist in analyzing and comparing existing provenance research at the conceptual level. We provide a detailed review of data provenance research and compare and contrast the research based on d semiotics framework. We conclude with an identification of challenges that will drive future research in this field.


  1. ALONSO, G. AND A. EL ABBADI. 1993. Goose: Geographic object oriented support environment. In ACM Workshop on Advances in Geographic Information Systems, Arlington, Virginia, 38-49.
  2. ALONSO, G. AND C. HAGEN. 1997. Geo-opera: Workflow concepts for spatial processes. In 5th International Symposium on Spatial Databases, Berlin, Germany, 238-258.
  3. ANDERSEN, P. 1991. A semiotic approach to construction and assessment of computer systems. Information Systems research: Contemporary Approaches & Emergent Traditions. Nissen, Klein and Hirschhaim. North Holland, Elsevier Science Publishers, 465-514.
  4. BALLOU, D., R. Y. WANG, et al. 1998. Modeling information manufacturing systems to determine information product quality. Management Science, 44(4):462-484.
  5. BALLOU, D. P. AND H. L. PAZER. 1985. Modeling data and process quality in multi-input, multioutput information systems. Management Science, 31(2):150-162.
  6. BARRON, T. M., R. H. L. CHIANG, et al. 1999. A semiotics framework for information systems classification and development. Decision Support Systems, 25:1-17.
  7. BHAGWAT, D., L. CHITICARIU, et al. 2005. An annotation management system for relational databases. VLDB JOURNAL, 14(4):373-396.
  8. BOSE, R. 2002. A conceptual framework for composing and managing scientific data lineage. In 14th International Conference on Scientific and Statistical Database Management, 15-19.
  9. BOSE, R. and J. FREW. 2004. Composing Lineage Metadata with XML for Custom Satellite-Derived Data Products. In the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Greece, 275-284.
  10. BOSE, R. and J. FREW. 2005. Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Surveys, 37(1):1-28.
  11. BRAUN, U., S. GARFINKEL, et al. 2006. Issues in Automatic Provenance Collection. Lecture Notes in Computer Science 4145. L. Moreau and I. Foster. Springer, 171-183.
  12. BRAUN, U. and A. SHINNAR. 2006. A Security Model for Provenance, Harvard University.
  13. BUNEMAN, P. 2006. Provenance management in curated database. SIGMOD, Chicago, Illinois, 539-550.
  14. BUNEMAN, P., A. CHAPMAN, et al. 2006. A provenance model for manually curated data. LNCS 4145. L. Moreau and I. Foster. Berlin/Heidelberg, Springer, 162-170.
  15. BUNEMAN, P., S. KHANNA, et al. 2000. Data Provenance: Some Basic Issues. FSTTCS, New Delhi, India, 87-93.
  16. BUNEMAN, P., S. KHANNA, et al. 2001. Why and Where: A Characterization of Data Provenance. Lecture Notes in Computer Science 1973, Springer, 316-330.
  17. BUNEMAN, P. and W. TAN. 2007. Provenance in Databases. SIGMOD, Beijing, China, 1171-1173.
  18. CAVANAUGH, R., G. GRAHAM, et al. 2002. Satisfying the Tax Collector: Using Data Provenance as a way to audit data analyses in High Energy Physics. Workshop on Data Derivation and Provenance.
  19. CERUTI, M., S. DAS, et al. 2006. Pedigree Information for Enhanced Situation and Treat Assessment. In 9th International Conference on Information Fusion (ICIF 2006), Florence, Italy.
  20. CUI, Y. and J. WIDOM. 2003. Lineage tracing for general data warehouse transformation. VLDB Journal, 12:41-58.
  21. CUI, Y., J. WIDOM, et al. 2000. Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactions on Database Systems, 25(2):179-227.
  22. DING, L., P. KOLARI, et al. 2005. On Homeland Security and the Semantic Web: a Provenance and Trust Aware Inference Framework. AAAI Spring Symposium on AI Technologies for Homeland Security, Stanford University, CA, 1-8.
  23. FILETO, R., C. B. MEDEIROS, et al. 2003. Using domain ontologies to help track data provenance. LNCS, 3806:84-98.
  24. FOSTER, I. T., J. S. VOECKLER, et al. 2002. Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation. In 14th International Conference on Scientific and Statistical Database Management, 37-46.
  25. FOX, M. and J. HUANG. 2005. Knowledge Provenance in Enterprise Information. International Journal of Production Research, 43(20):4471-4492.
  26. FREW, J. and R. BOSE. 2001. Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products. In 13th International Conference on Scientific and Statistical Database Management, Fairfax, VA, 180-189.
  27. FREW, J. and R. BOSE. 2002. Lineage issues for scientific data and information. Data provenance/derivation workshop.
  28. GOBLE, C. 2002. Position statement: Musings on provenance, workflow and annotations for bioinformatics. In Data provenance/derivation workshop.
  29. GREENWOOD, M., C. GOBLE, et al. 2003. Provenance of e-Science Experiments - experience from Bioinformatics. UK e-Science All Hands Meeting, Nottingham, UK.
  30. GROTH, P., M. LUCK, et al. 2004. A protocol for recording provenance in service oriented grids. Lecture Notes in Computer Science 3544/2005, Springer, 124-139.
  31. GROTH, P., S. MILES, et al. 2005. Recording and Using Provenance in a Protein Compressibility Experiment. High Performance Distributed Computing, HPDC-14. 201-208.
  32. LANTER, D. 1991. Design of a Lineage-Based Meta-Data Base for GIS. Cartography and Geographic Information Systems, 18:255-261.
  33. LANTER, D. and R. ESSINGER. 1991. User-centered graphical user interface design for GIS. National Center for Geographic Information and Analysis, UCSB: 91-96.
  34. LYNCH, C. 2001. When Documents Deceive: Trust and Provenance as New Factors for Information Retrieval in a Tangled Web. Journal of the American Society for Information Science and Technology, 52(1):12-17.<12::AID-ASI1062>3.0.CO;2-V
  35. MANN, B. 2002. Annotation of special structures in astronomy. In Workshop on Data Derivation and Provenance, Chicago, Illinois.
  36. MILES, S., S. WONG, et al. 2007. Provenance-based validation of e-science experiments. Web Semantics: Science, Services and Agents on the World Wide Web 5(1):28-38.
  37. MOREAU, L., P. GROTH, et al. 2007. The Provenance of Electronic Data. Communications of the ACM, 51(4):52-58.
  38. MORRIS, C. W. 1946. Signs, Language and Behavior, Prentice-Hall, New York.
  39. MUNISWAMY-REDDY, K., D. HOLLAND, et al. 2006. Provenance-Aware Storage System. In the 2006 USENIX Annual Technical Conference, Boston, MA: 4-4.
  40. MYERS, J., A. CHAPPELL, et al. 2003a. Re-integrating the research record. IEEE Computing in Science & Engineering, 5(3):44-50.
  41. MYERS, J., C. PANCERELLA, et al. 2003b. Multi-scale Science: Supporting Emerging Practice with Semantically-Derived Provenance. In Semantic Web Technologies for Searching and Retrieving Scientific Data Workshop at the 2nd International Semantic Web Conference, Sanibel Island, FL.
  42. PANCERELLA, C. 2003. Metadata in the Collaboratory for Multi-scale Chemical Science. In DC-2003: the 2003 Dublin Core Conference, Seattle, Washington.
  43. PEARSON, D. 2002. The Grid: Requirements for Establishing the Provenance of Derived Data. In Workshop on Data Derivation and Provenance, Chicago, Illinois.
  44. PRAT, N. and S. MADNICK. 2007. Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage. In Seventeenth Annual Workshop on Information Technologies and Systems (WITS2007), Montreal, Canada.
  45. RAM, S. and J. LIU. 2007. W7 Model: an Ontological Model for Capturing Data Provenance Semantics. Lecture Notes in Computer Science 4512. P. Chen, Springer: 17-29.
  46. REICH, M., T. LIEFELD, et al. 2006. GenePattern 2.0. Nature Genetics, 38:500-501.
  47. ROMEU, J. L. 1999. Data Quality and Pedigree, Material Ease, 1999.
  48. SAUSSURE. 1966. Course in General Linguistics. McGraw-Hill.
  49. SIMMHAN, Y., B. PLALE, et al. 2005. A Survey of Data Provenance Techniques. Technical Report IUB-CS-TR618, Indiana University.
  50. SIMMHAN, Y., B. PLALE, et al. 2006. A Framework for Collecting Provenance in Data-Centric Scientific Workflows. The IEEE International Conference on Web Services. 427-436.
  51. STAMPER, R. 1991. The Semiotic Framework for Information Systems Research. Information Systems Research: Contemporary Approaches and Emergent Traditions. H. Nissen, H. Klein and R. Hirschheim, Elsevier Science Publishers, 515-527.
  52. SZOMSZOR, M. and L. MOREAU. 2003. Recording and reasoning over data provenance in web and grid services. Lecture Notes in Computer Science 2888, Springer, 603-620.
  53. TAN, V., P. GROTH, et al. 2006. Security Issues in a SOA-Based Provenance System. LNCS 4145. L. Moreau and I. Foster. Springer, 203-211.
  54. TAN, W. 2004. Research Problems in Data Provenance. IEEE Data Engineering Bulletin, 27(4):45-52.
  55. WANG, R. and D. STRONG. 1996. Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4):5-30.
  56. WANG, Y. R. and S. E. MADNICK. 1990. A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In the sixteenth international conference on Very large databases, Brisbane, Australia, 519-533.
  57. WIDOM, J. 2005. Trio: A System for Integrated Management of Data, Accuracy and Lineage. In Biennial Conference on Innovative Data Systems Research (CIDR), 262-276.
  58. WOODRUFF, A. and M. STONEBRAKER. 1997. Supporting Fine-grained Data Lineage in a database Visualization Environment. In 13th International Conference on Data Engineering (ICDE), 91-102.
  59. ZHAO, Y., M. WILDE, et al. 2006. Applying the Virtual Data Provenance Model. Lecture Notes in Computer Science 4145. L. Moreau and I. Foster. Springer, 148-161.
  60. ZHAO, J., C. GOBLE, et al. 2003. Annotating, linking and browsing provenance logs for e-Science. In 2nd Intl Semantic Web Conference (ISWC2003) Workshop on Retrieval of Scientific Data, Sanibel Island, FL.
  61. ZHAO, J., C. WROE, et al. 2004. Using Semantic Web Technologies for Representing E-science Provenance. Lecture Notes in Computer Science 3298. Berlin/Heidelberg, Springer, 92-106.

Cited by

  1. A Semantic Foundation for Provenance Management vol.1, pp.1, 2012,