Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of Computing Science and Engineering
Journal Basic Information
Journal DOI :
Korean Institute of Information Scientists and Engineers
Editor in Chief :
In-Sup Lee / Il-Yeol Song / Jong C. Park / Tae-Whan Kim
Volume & Issues
Volume 1, Issue 2 - Dec 2007
Volume 1, Issue 1 - Sep 2007
Selecting the target year
Extending the Multidimensional Data Model to Handle Complex Data
Mansmann, Svetlana ; Scholl, Marc H. ;
Journal of Computing Science and Engineering, volume 1, issue 2, 2007, Pages 125~160
DOI : 10.5626/JCSE.2007.1.2.125
Data Warehousing and OLAP (On-Line Analytical Processing) have turned into the key technology for comprehensive data analysis. Originally developed for the needs of decision support in business, data warehouses have proven to be an adequate solution for a variety of non-business applications and domains, such as government, research, and medicine. Analytical power of the OLAP technology comes from its underlying multidimensional data model, which allows users to see data from different perspectives. However, this model displays a number of deficiencies when applied to non-conventional scenarios and analysis tasks. This paper presents an attempt to systematically summarize various extensions of the original multidimensional data model that have been proposed by researchers and practitioners in the recent years. Presented concepts are arranged into a formal classification consisting of fact types, factual and fact-dimensional relationships, and dimension types, supplied with explanatory examples from real-world usage scenarios. Both the static elements of the model, such as types of fact and dimension hierarchy schemes, and dynamic features, such as support for advanced operators and derived elements. We also propose a semantically rich graphical notation called X-DFM that extends the popular Dimensional Fact Model by refining and modifying the set of constructs as to make it coherent with the formal model. An evaluation of our framework against a set of common modeling requirements summarizes the contribution.
Fast Conditional Independence-based Bayesian Classifier
Junior, Estevam R. Hruschka ; Galvao, Sebastian D. C. de O. ;
Journal of Computing Science and Engineering, volume 1, issue 2, 2007, Pages 162~176
DOI : 10.5626/JCSE.2007.1.2.162
Machine Learning (ML) has become very popular within Data Mining (KDD) and Artificial Intelligence (AI) research and their applications. In the ML and KDD contexts, two main approaches can be used for inducing a Bayesian Network (BN) from data, namely, Conditional Independence (CI) and the Heuristic Search (HS). When a BN is induced for classification purposes (Bayesian Classifier - BC), it is possible to impose some specific constraints aiming at increasing the computational efficiency. In this paper a new CI based approach to induce BCs from data is proposed and two algorithms are presented. Such approach is based on the Markov Blanket concept in order to impose some constraints and optimize the traditional PC learning algorithm. Experiments performed with the ALARM, as well as other six UCI and three artificial domains revealed that the proposed approach tends to execute fewer comparison tests than the traditional PC. The experiments also show that the proposed algorithms produce competitive classification rates when compared with both, PC and Naive Bayes.
A Data Mining Approach for Selecting Bitmap Join Indices
Bellatreche, Ladjel ; Missaoui, Rokia ; Necir, Hamid ; Drias, Habiba ;
Journal of Computing Science and Engineering, volume 1, issue 2, 2007, Pages 177~194
DOI : 10.5626/JCSE.2007.1.2.177
Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.
A Clustered Dwarf Structure to Speed up Queries on Data Cubes
Bao, Yubin ; Leng, Fangling ; Wang, Daling ; Yu, Ge ;
Journal of Computing Science and Engineering, volume 1, issue 2, 2007, Pages 195~210
DOI : 10.5626/JCSE.2007.1.2.195
Dwarf is a highly compressed structure, which compresses the cube by eliminating the semantic redundancies while computing a data cube. Although it has high compression ratio, Dwarf is slower in querying and more difficult in updating due to its structure characteristics. We all know that the original intention of data cube is to speed up the query performance, so we propose two novel clustering methods for query optimization: the recursion clustering method which clusters the nodes in a recursive manner to speed up point queries and the hierarchical clustering method which clusters the nodes of the same dimension to speed up range queries. To facilitate the implementation, we design a partition strategy and a logical clustering mechanism. Experimental results show our methods can effectively improve the query performance on data cubes, and the recursion clustering method is suitable for both point queries and range queries.
Trajectory Data Warehouses: Design and Implementation Issues
Orlando, Salvatore ; Orsini, Renzo ; Raffaeta, Alessandra ; Roncato, Alessandro ; Silvestri, Claudio ;
Journal of Computing Science and Engineering, volume 1, issue 2, 2007, Pages 211~232
DOI : 10.5626/JCSE.2007.1.2.211
In this paper we investigate some issues and solutions related to the design of a Data Warehouse (DW), storing several aggregate measures about trajectories of moving objects. First we discuss the loading phase of our DW which has to deal with overwhelming streams of trajectory observations, possibly produced at different rates, and arriving in an unpredictable and unbounded way. Then, we focus on the measure presence, the most complex measure stored in our DW. Such a measure returns the number of distinct trajectories that lie in a spatial region during a given temporal interval. We devise a novel way to compute an approximate, but very accurate, presence aggregate function, which algebraically combines a bounded amount of measures stored in the base cells of the data cube. We conducted many experiments to show the effectiveness of our method to compute such an aggregate function. In addition, the feasibility of our innovative trajectory DW was validated with an implementation based on Oracle. We investigated the most challenging issues in realizing our trajectory DW using standard DW technologies: namely, the preprocessing and loading phase, and the aggregation functions to support OLAP operations.