DOI QR코드

DOI QR Code

Appearance-Order-Based Schema Matching

  • Ding, Guohui (Department of Computer Science, Shenyang Aerospace University) ;
  • Cao, Keyan (Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, College of Information Science & Engineering, Northeastern University) ;
  • Wang, Guoren (Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, College of Information Science & Engineering, Northeastern University) ;
  • Han, Dong (Department of Computer Science, Shenyang Aerospace University)
  • Received : 2012.12.13
  • Accepted : 2014.05.21
  • Published : 2014.06.30

Abstract

Schema matching is widely used in many applications, such as data integration, ontology merging, data warehouse and dataspaces. In this paper, we propose a novel matching technique that is based on the order of attributes appearing in the schema structure of query results. The appearance order embodies the extent of the importance of an attribute for the user examining the query results. The core idea of our approach is to collect statistics about the appearance order of attributes from the query logs, to find correspondences between attributes in the schemas to be matched. As a first step, we employ a matrix to structure the statistics around the appearance order of attributes. Then, two scoring functions are considered to measure the similarity of the collected statistics. Finally, a traditional algorithm is employed to find the mapping with the highest score. Furthermore, our approach can be seen as a complementary member to the family of the existing matchers, and can also be combined with them to obtain more accurate results. We validate our approach with an experimental study, the results of which demonstrate that our approach is effective, and has good performance.

Keywords

References

  1. E. Rahm and P. A. Bernstein, "A survey of approaches to automatic schema matching," VLDB Journal, vol. 10, no. 4, pp. 334-350, 2001. https://doi.org/10.1007/s007780100057
  2. J. Kang and J. F. Naughton, "On schema matching with opaque column names and data values," in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 205-216.
  3. J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy, "Corpus-based schema matching," in Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, 2005, pp. 57-68.
  4. X. Dong, A. Y. Halevy, and C. Yu, "Data integration with uncertainty," in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 687-698.
  5. H. Elmeleegy, M. Ouzzani, and A. Elmagarmid, "Usagebased schema matching," in Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 20-29.
  6. S. Kirkpatrick and M. P. Vecchi, "Optimization by simmulated annealing," Science, vol. 220, no. 4598, pp. 671-680, 1983. https://doi.org/10.1126/science.220.4598.671
  7. P. Bohannon, E. Elnahrawy, W. Fan, and M. Flaster, "Putting context into schema matching," in Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 307-318.
  8. L. Popa, Y. Velegrakis, M. A. Hernandez, R. J. Miller, and R. Fagin, "Translating web data," in Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 598-609.
  9. R. J. Miller, L. M. Haas, and M. A. Hernandez, "Schema mapping as query discovery," in Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, 2000, pp. 77-88.
  10. Y. An, A. Borgida, R. J. Miller, and J. Mylopoulos, "A semantic approach to discovering schema mapping expressions," in Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 206-215.
  11. G. Mecca, P. Papotti, and S. Raunich, "Core schema mappings," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Providence, RI, 2009, pp. 655-668.
  12. S. Melnik, H. Garcia-Molina, and E. Rahm, E. "Similarity flooding: a versatile graph matching algorithm and its application to schema matching," in Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, 2002, pp. 117-128.
  13. A. Das Sarma, X. Dong, and A. Halevy, "Bootstrapping pay-as-you-go data integration systems," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 861-874.
  14. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa, "Data exchange: semantics and query answering," in Proceedings of the 9th International Conference on Database Theory, Siena, Italy, 2003, pp. 207-224.
  15. R. H. Warren and F. W. Tompa, "Multi-column substring matching for database schema translation," in Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 331-342.