A single-phase algorithm for mining high utility itemsets using compressed tree structures

Bhat B, Anup;SV, Harish;M, Geetha;

doi:10.4218/etrij.2020-0300

ETRI Journal

Volume 43 Issue 6
/
Pages.1024-1037
/
2021
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

A single-phase algorithm for mining high utility itemsets using compressed tree structures

Bhat B, Anup (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education) ;
SV, Harish (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education) ;
M, Geetha (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education)

Received : 2020.08.05
Accepted : 2021.01.22
Published : 2021.12.01

https://doi.org/10.4218/etrij.2020-0300 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Mining high utility itemsets (HUIs) from transaction databases considers such factors as the unit profit and quantity of purchased items. Two-phase tree-based algorithms transform a database into compressed tree structures and generate candidate patterns through a recursive pattern-growth procedure. This procedure requires a lot of memory and time to construct conditional pattern trees. To address this issue, this study employs two compressed tree structures, namely, Utility Count Tree and String Utility Tree, to enumerate valid patterns and thus promote fast utility computation. Furthermore, the study presents an algorithm called single-phase utility computation (SPUC) that leverages these two tree structures to mine HUIs in a single phase by incorporating novel pruning strategies. Experiments conducted on both real and synthetic datasets demonstrate the superior performance of SPUC compared with IHUP, UP-Growth, and UP-Growth+algorithms.

Keywords

Acknowledgement

This work was supported by Manipal Academy of Higher Education Dr. T.M.A Pai Research Scholarship under Research Registration No. 170900117.

References

W. Zhang et al., Text clustering using frequent itemsets, Knowl-Based Syst. 23 (2010), no. 5, 379-388. https://doi.org/10.1016/j.knosys.2010.01.011
S. Naulaerts, et al., A primer to frequent itemset mining for bioinformatics, Brief Bioinform. 16 (2015), 216-231. https://doi.org/10.1093/bib/bbt074
R. Harpaz, H. S. Chase, and C. Friedman, Mining multi-item drug adverse effect associations in spontaneous reporting systems, BMC Bioinform. 11 (2010), no. 9, S7.
J. Han et al., Frequent pattern mining: Current status and future directions, Data Min. Knowl. Disc. 15 (2007), no. 1, 55-86. https://doi.org/10.1007/s10618-006-0059-1
H. Yao, H. J. Hamilton, and C. J. Butz, A foundational approach to mining itemset utilities from databases, in Proc. SIAM Int. Conf. Data Min. (Lake Buena Vista, FL, USA), Apr. 2004, pp. 482-486.
H. Yao and H. J. Hamilton. Mining itemset utilities from transaction databases, Data Knowl. Eng. 59 (2006), no. 3, 603-626. https://doi.org/10.1016/j.datak.2005.10.004
Y. Liu and W.-K. Liao, A fast high utility itemsets mining algorithm, in Proc. Int. Workshop Utility-Based Data Min. (New York, NY, USA), Aug. 2005, pp. 90-99.
Y. Liu, W.-K. Liao, and A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in Advances in Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, Germany, 2005, pp. 689-695.
Y. Liu et al., High utility itemsets mining, Int. J. Inf. Tech. Decis. Making 9 (2010), no. 6, 905-934. https://doi.org/10.1142/S0219622010004159
R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. Int. Conf. Very Large Data Bases (Santiago, Chile), Sept. 1994, 487-499.
C. W. Lin, T. P. Hong, and W. H. Lu, An effective tree structure for mining high utility itemsets, Expert Syst. Appl. 38 (2011), no. 6, 7419-7424. https://doi.org/10.1016/j.eswa.2010.12.082
C. F. Ahmed et al., HUC-Prune: An efficient candidate pruning technique to mine high utility patterns, Appl. Intell. 34 (2011), no. 2, 181-198. https://doi.org/10.1007/s10489-009-0188-5
C. F. Ahmed et al., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng. 21 (2009), no. 12, 1708-1721. https://doi.org/10.1109/TKDE.2009.46
V. S. Tseng et al., UP-Growth: An efficient algorithm for high utility itemset mining, Discov. Data Min. (New York, NY, USA), July (2010), 253-262.
V. S. Tseng et al., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng. 28 (2016), no 1, 54-67. https://doi.org/10.1109/TKDE.2015.2458860
J. Han et al., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Disc. 8 (2004), no. 1, 53-87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
M Liu and J Qu, Mining high utility itemsets without candidate generation, in Proc. ACM Int. Conf. Inform. Knowl. Manag. (New York, NY, USA), Oct. 2012, pp. 55-64.
S. Krishnamoorthy, Pruning strategies for mining high utility itemsets, Expert Syst. Appl. 42 (2015), no. 5, 2371-2381. https://doi.org/10.1016/j.eswa.2014.11.001
P. Fournier-Viger et al., Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in International Symposium on Methodologies for Intelligent Systems, Springer, Berlin, Heidelberg, Germany, 2014, pp. 83-92.
C. Zhang et al., An empirical evaluation of high utility itemset mining algorithms, Expert Syst. Appl. 101 (2018), 91-115. https://doi.org/10.1016/j.eswa.2018.02.008
S. Zida et al., Efim: A fast and memory efficient algorithm for high-utility itemset mining, Knowl. Inf. Syst. 51 (2017), no. 2, 595-625. https://doi.org/10.1007/s10115-016-0986-0
J. Liu, K. E. Wang, and B. C. M. Fung, Direct discovery of high utility itemsets without candidate generation, in Proc. IEEE Int. Conf. Data Min. (Brussels, Belgium), Dec. 2012, pp. 984-989.
J. Liu, K. Wang, and B. C. M. Fung, Mining high utility patterns in one phase without generating candidates, IEEE Trans. Knowl. Data Eng. 28 (2016), no. 5, 1245-1257. https://doi.org/10.1109/TKDE.2015.2510012
S. Dawar, D. Bera, and V. Goyal, High-utility itemset mining for subadditive monotone utility functions, arXiv preprint, CoRR, 2018, arXiv:1812.07208.
V. S. Ananthanarayana, D. K. Subramanian, and M. N. Murty, Scalable, distributed and dynamic mining of association rules, in High Performance Computing-HiPC 2000, vol. 1970, Springer, Berlin, Heidelberg, Germany, 2000, pp. 559-566.
M. Geetha and R. J. D'souza, An efficient reduced pattern count tree method for discovering most accurate set of frequent itemsets, Int. J. Comp. Sci. Netw. Sec. 8 (2008), no. 8, 121-126.
P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Developer's Guide, 2020, available at https://www.philippe-fournier-viger.com/spmf/index.php?link=developers.php
P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Datasets, 2020. available at https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

ETRI Journal

A single-phase algorithm for mining high utility itemsets using compressed tree structures

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)