Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

Lin, Jerry Chun-Wei; Djenouri, Youcef; Srivastava, Gautam; Li, Yuanfa; Yu, Philip S.

dc.contributor.author	Lin, Jerry Chun-Wei
dc.contributor.author	Djenouri, Youcef
dc.contributor.author	Srivastava, Gautam
dc.contributor.author	Li, Yuanfa
dc.contributor.author	Yu, Philip S.
dc.date.accessioned	2022-08-30T10:25:44Z
dc.date.available	2022-08-30T10:25:44Z
dc.date.created	2021-12-24T23:40:59Z
dc.date.issued	2021
dc.identifier.citation	ACM Transactions on Knowledge Discovery from Data. 2021, 16 (3), 60.	en_US
dc.identifier.issn	1556-4681
dc.identifier.uri	https://hdl.handle.net/11250/3014319
dc.description.abstract	High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.	en_US
dc.language.iso	eng	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.subject	High-utility sequential pattern mining	en_US
dc.subject	MapReduce	en_US
dc.subject	Large-scale	en_US
dc.subject	Parallel and distributed	en_US
dc.title	Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.source.pagenumber	26	en_US
dc.source.volume	16	en_US
dc.source.journal	ACM Transactions on Knowledge Discovery from Data	en_US
dc.source.issue	3	en_US
dc.identifier.doi	10.1145/3487046
dc.identifier.cristin	1971975
dc.source.articlenumber	60	en_US
cristin.ispublished	true
cristin.fulltext	preprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: TKDD-final.pdf
Størrelse:: 887.4Kb
Format:: PDF
Beskrivelse:: Article

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin - SINTEF AS [5802]
SINTEF Digital [2501]

Vis enkel innførsel