DOI QR코드

DOI QR Code

A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases

  • Received : 2010.03.10
  • Accepted : 2010.08.02
  • Published : 2010.10.31

Abstract

Mining sequential patterns is an important research issue in data mining and knowledge discovery with broad applications. However, the existing sequential pattern mining approaches consider only binary frequency values of items in sequences and equal importance/significance values of distinct items. Therefore, they are not applicable to actually represent many real-world scenarios. In this paper, we propose a novel framework for mining high-utility sequential patterns for more real-life applicable information extraction from sequence databases with non-binary frequency values of items in sequences and different importance/significance values for distinct items. Moreover, for mining high-utility sequential patterns, we propose two new algorithms: UtilityLevel is a high-utility sequential pattern mining with a level-wise candidate generation approach, and UtilitySpan is a high-utility sequential pattern mining with a pattern growth approach. Extensive performance analyses show that our algorithms are very efficient and scalable for mining high-utility sequential patterns.

Keywords

References

  1. R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th Int. Conf. Data Eng., 1995, pp. 3-14.
  2. R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. 5th Int. Conf. Extending Database Technol., 1996, pp. 3-17.
  3. M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Mach. Learning, vol. 42, no. 1-2, Jan. 2001, pp. 31- 60. https://doi.org/10.1023/A:1007652502315
  4. J. Ayres et al., "Sequential Pattern Mining Using a Bitmap Representation," Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2002, pp. 429-435.
  5. J. Pei et al., "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Trans. Knowl. Data Eng., vol. 16, no. 11, Oct. 2004, pp. 1424-1440. https://doi.org/10.1109/TKDE.2004.77
  6. J. Pei et al., "PrefixSpan: Mining Sequential Patterns by Prefix- Projected Growth," Proc. 17th Int. Conf. Data Eng., 2001, pp. 215-224.
  7. J. Wang, J. Han, and C. Li, "Frequent Closed Sequence Mining without Candidate Maintenance," IEEE Trans. Knowl. Data Eng., vol. 19, no. 8, 2007, pp. 1042-1056. https://doi.org/10.1109/TKDE.2007.1043
  8. H. Yao, H.J. Hamilton, and C.J. Butz, "A Foundational Approach to Mining Itemset Utilities from Databases," Proc. 3rd SIAM Int. Conf. Data Mining, 2004, pp. 482-486.
  9. H. Yao and H.J. Hamilton, "Mining Itemset Utilities from Transaction Databases," Data Knowl. Eng., vol. 59, no. 3, 2006, pp. 603-626. https://doi.org/10.1016/j.datak.2005.10.004
  10. Y. Liu, W.K. Liao, and A. Choudhary, "A Two Phase Algorithm for Fast Discovery of High Utility of Itemsets," Proc. 9th Pacific- Asia Conf. Knowl. Discovery Data Mining , 2005, pp. 689-695.
  11. C.F. Ahmed et al., "An Efficient Candidate Pruning Technique for HUP Mining," Proc.13th Pacific-Asia Conf. Knowl. Discovery Data Mining, 2009, pp. 749-756.
  12. Y.C. Li, J.S. Yeh, and C.C. Chang, "Isolated Items Discarding Strategy for Discovering High Utility Itemsets," Data Knowl. Eng., vol. 64, no. 1, 2008, pp. 198-217. https://doi.org/10.1016/j.datak.2007.06.009
  13. C.F. Ahmed et al., "Efficient Tree Structures for HUP Mining in Incremental Databases," IEEE Trans. Knowl. Data Eng., vol. 21, no. 12, 2009, pp. 1708-1721. https://doi.org/10.1109/TKDE.2009.46
  14. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 2nd Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
  15. J. Han et al., "Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining Knowl. Discovery, vol. 8, 2004, pp. 53-87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  16. J. Han et al., "Frequent Pattern Mining: Current Status and Future Directions," Data Mining Knowl. Discovery, vol. 15, no. 1, 2007, pp. 55-86. https://doi.org/10.1007/s10618-006-0059-1
  17. M.N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential Pattern Mining with Regular Expression Constraints," Proc. 25th Int. Conf. Very Large Data Bases, 1999, pp. 223-234.
  18. J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," Proc. 11th Int. Conf. Inform. Knowl. Management, 2002, pp. 18-25.
  19. U. Yun, "A New Framework for Detecting Weighted Sequential Patterns in Large Sequence Databases," Knowl.-Based Syst., vol. 21, no. 2, 2008, pp. 110-122. https://doi.org/10.1016/j.knosys.2007.04.002
  20. U. Yun, "WIS: Weighted Interesting Sequential Pattern Mining with a Similar Level of Support and/or Weight," ETRI J., vol. 29, no. 3, June 2007, pp. 336-352. https://doi.org/10.4218/etrij.07.0106.0067
  21. C. Kim et al., "SQUIRE: Sequential Pattern Mining with Quantities," J. Syst. Software, vol. 80, no. 10, 2007, pp. 1726- 1745. https://doi.org/10.1016/j.jss.2006.12.562
  22. http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_ mining/datasets/syndata.html
  23. Frequent Itemset Mining Dataset Repository. Available at: http://fimi.cs.helsinki.fi/data/
  24. Z. Zheng, R. Kohavi, and L. Mason, "Real World Performance of Association Rule Algorithms," Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2001, pp. 401-406.

피인용 문헌

  1. CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction vol.27, pp.10, 2015, https://doi.org/10.1109/tkde.2015.2420557
  2. On efficiently mining high utility sequential patterns vol.49, pp.2, 2010, https://doi.org/10.1007/s10115-015-0914-8
  3. Mining High Utility Sequential Patterns with Negative Item Values vol.31, pp.10, 2010, https://doi.org/10.1142/s0218001417500355
  4. Efficiently mining high utility sequential patterns in static and streaming data vol.21, pp.None, 2010, https://doi.org/10.3233/ida-170874
  5. Mining of high utility-probability sequential patterns from uncertain databases vol.12, pp.7, 2010, https://doi.org/10.1371/journal.pone.0180931
  6. Mining significant high utility gene regulation sequential patterns vol.11, pp.suppl6, 2010, https://doi.org/10.1186/s12918-017-0475-4
  7. Efficient High Utility Negative Sequential Patterns Mining in Smart Campus vol.6, pp.None, 2010, https://doi.org/10.1109/access.2018.2827167
  8. Mining High Utility Sequential Patterns Using Multiple Minimum Utility vol.32, pp.10, 2010, https://doi.org/10.1142/s0218001418590176
  9. 시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법 vol.7, pp.2, 2018, https://doi.org/10.3745/ktsde.2018.7.2.51
  10. On Incremental High Utility Sequential Pattern Mining vol.9, pp.5, 2010, https://doi.org/10.1145/3178114
  11. An efficient algorithm for mining periodic high-utility sequential patterns vol.48, pp.12, 2018, https://doi.org/10.1007/s10489-018-1227-x
  12. An Algorithm for Mining High Utility Sequential Patterns with Time Interval vol.19, pp.4, 2010, https://doi.org/10.2478/cait-2019-0032
  13. An Efficient Algorithm for Extracting High-Utility Hierarchical Sequential Patterns vol.2020, pp.None, 2010, https://doi.org/10.1155/2020/8816228
  14. Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists vol.11, pp.1, 2010, https://doi.org/10.3390/info11010044
  15. High average-utility sequential pattern mining based on uncertain databases vol.62, pp.3, 2020, https://doi.org/10.1007/s10115-019-01385-8
  16. Mining High-utility Temporal Patterns on Time Interval-based Data vol.11, pp.4, 2010, https://doi.org/10.1145/3391230
  17. Utility Mining across Multi-Sequences with Individualized Thresholds vol.1, pp.2, 2010, https://doi.org/10.1145/3362070
  18. e-HUNSR: An Efficient Algorithm for Mining High Utility Negative Sequential Rules vol.12, pp.8, 2010, https://doi.org/10.3390/sym12081211
  19. Utility Mining Across Multi-Dimensional Sequences vol.15, pp.5, 2010, https://doi.org/10.1145/3446938
  20. On-Shelf Utility Mining of Sequence Data vol.16, pp.2, 2010, https://doi.org/10.1145/3457570
  21. Multi-core parallel algorithms for hiding high-utility sequential patterns vol.237, pp.None, 2010, https://doi.org/10.1016/j.knosys.2021.107793
  22. Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model vol.16, pp.3, 2010, https://doi.org/10.1145/3487046