Fig. 1. Conceptual diagram for table integration support model
Fig. 2. Conceptual diagram of preprocessing step
Fig. 3. Conceptual diagram of analysis step
Fig. 4. Flowchart for similarity table clustering
Fig. 5. Procedure for cluster analysis
Fig. 6. The number of cluster according to average of Silhouette Coefficient
Fig. 7. Part of the SQL query where the table in the 72 cluster is used
Fig. 8. Part of the SQL query where the table in the 91 cluster is used
Table 1. Evaluation criteria using average silhouette coefficient of cluster
Table 2. Table integration criteria based-on co-occurence of tables in a cluster
Table 3. Part of top list of similar table
Table 4. Part of co-occurrence table pair and co-occurence probability
Table 5. Sample clusters with high possibility of table integration
Table 6. Table structure of 72 cluster with a high possibility of table integration
Table 7. Sample clusters that need to be reviewed by expert
Table 8. Table of 91 cluster that need to be reviewed by manually
References
- Jong Suk Lee, Chang Ho Lee, "Modeling on Data Performance for Very Large Database," Proceedings of the Korea Safety Management & Science, No. 1, pp. 383-391, 2012.
- Kang Soo Seo, "The Guide for Data Architecture Professional" Korea Data Agency, pp. 214-553, 2013.
- R. Y. Wang, V. C. Storey and C. P. Firth, “A Framework for Analysis of Data Quality Research,” Transactions on Knowledge and Data Engineering, Vol. 7, No. 4, pp. 623-640, Aug. 1995. https://doi.org/10.1109/69.404034
- Hae Kyung Rhee, "Harmfulness of Denormalization Adopted for Database for Database Performance Enhancement," Journal of the Institute of Electronics and Information Engineers, Vol. 42, No. 3, May 2005.
- Hye Young Seo, Seo Young Kwon, Jae Kwon Ahn, Young Jin Kim, "A Case Study on the Implementation of Master Data Management System for Global Manufacturing Company," Entrue Journal of Information Technology, Vol. 7, No. 2, pp. 91-102, Jul. 2008. https://doi.org/10.3923/itj.2008.91.97
- S. Castano, V. Antonellis, M. G. Fugini and B. Pernici, "Conceptual Schema Analysis: Techniques and Applications," ACM Transactions on Database Systems, Vol. 23, No. 3, pp. 286-333, Sep. 1998. https://doi.org/10.1145/293910.293150
- H. Kopcke and E. Rahm, "Frameworks for entity matching: A comparison," Data & Knowledge Engineering, Vol. 69, No. 2, pp. 197-210, Feb. 2010. https://doi.org/10.1016/j.datak.2009.10.003
- Hong Girl Lee et al., "A Study on the Database Integration Methodology using XML," Journal of Korean Navigation and Port Research, Vol. 29, No. 5, pp. 883-890, Dec. 2005. https://doi.org/10.5394/KINPR.2005.29.10.883
- Sanjay Madria et al., "An XML Schema Integration and Query Mechanism System," Data & Knowledge Engineering, Vol. 65, No. 2, pp. 265-303, May 2008.
- D. P. Groth, "Visual Representation of Database Queries using Structual similarity," Information Visualization, pp. 102-107, 2003.
- Yun Hee Han, "Design and Implementation of Database Cache engine based on Similarity Query Matching" Masterr's Thesis, Korea Polytechnic University, 2008.
- P. J. Rousseeuw, "Silhouette: a graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, Vol. 20, pp. 53-65, Nov. 1987. https://doi.org/10.1016/0377-0427(87)90125-7
- Soojung Lee, "Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering," The Journal of Korean Association of Computer Education, Vol. 19, No. 4, pp. 59-66, Jul. 2016. https://doi.org/10.32431/KACE.2016.19.4.006
- L. Kaufman and P. J. Rousseeuw, "Finding Groups in Data: An Introduction to Cluster Analysis," Wiley, New York, 1990.