JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Saliency Score-Based Visualization for Data Quality Evaluation
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Saliency Score-Based Visualization for Data Quality Evaluation
Kim, Yong Ki; Lee, Keon Myung;
  PDF(new window)
 Abstract
Data analysts explore collections of data to search for valuable information using various techniques and tricks. Garbage in, garbage out is a well-recognized idiom that emphasizes the importance of the quality of data in data analysis. It is therefore crucial to validate the data quality in the early stage of data analysis, and an effective method of evaluating the quality of data is hence required. In this paper, a method to visually characterize the quality of data using the notion of a saliency score is introduced. The saliency score is a measure comprising five indexes that captures certain aspects of data quality. Some experiment results are presented to show the applicability of proposed method.
 Keywords
Data analysis;Visualization;Data quality analysis;Data quality metrics;
 Language
English
 Cited by
 References
1.
L. L. Pipino, Y. W. Lee, and R. Y. Wang, "Data quality assessment," Communications of the ACM, vol. 45, no. 4, pp. 211-218, 2002. http://dx.doi.org/10.1145/505248.506010 crossref(new window)

2.
B. Heinrich, M. Kaiser, and M. Klier, "How to measure data quality? A metric-based approach," in Proceedings of the 28th International Conference of Information Systems (ICIS), Montreal, Canada, 2007, pp. 1-15.

3.
US Environmental Protection Agency, "Data quality assessment: statistical methods for practitioners," US Environmental Protection Agency, Washington, DC, EPA/240/B-06/003, 2006.

4.
A. D. Chapman, Principles of Data Quality. Copenhagen: Global Biodiversity Information Facility, 2005.

5.
R. Y.Wang and D. M. Strong, "Beyond accuracy: what data quality means to data consumers," Journal of Management Information Systems, vol. 12, no. 4, pp. 5-33. 1996. crossref(new window)

6.
E. M. Knorr and R. T. Ng, "Finding intensional knowledge of distance-based outliers," in Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland,1999, pp. 211-222.

7.
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander, "LOF: identifying density-based local outliers," in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 93-104. http://dx.doi.org/10.1145/342009.335388 crossref(new window)

8.
D. M. Hawkins, Identification of Outliers. London: Chapman and Hall, 1980.

9.
H. P. Kriegel, P. Kroger, and A. Zimek, "Outlier detection techniques," presented at the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Bangkok, Thailand, April 27-30, 2009.

10.
K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT press, 2012.

11.
C. M. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York, NY: Springer, 2006.

12.
C. C. Aggarwal, Data Mining: The Textbook. Cham, Switzerland: Springer, 2015.

13.
G. Merz and P. Murphy, "UCI repository of machine learning databases," Department of Information and Computer Science, University of California, Irvine, CA, Technical Report, 1996.