Vis enkel innførsel

dc.contributor.authorLin, Jerry Chun-Wei
dc.contributor.authorDjenouri, Youcef
dc.contributor.authorSrivastava, Gautam
dc.contributor.authorLi, Yuanfa
dc.contributor.authorYu, Philip S.
dc.date.accessioned2022-08-30T10:25:44Z
dc.date.available2022-08-30T10:25:44Z
dc.date.created2021-12-24T23:40:59Z
dc.date.issued2021
dc.identifier.citationACM Transactions on Knowledge Discovery from Data. 2021, 16 (3), 60.en_US
dc.identifier.issn1556-4681
dc.identifier.urihttps://hdl.handle.net/11250/3014319
dc.description.abstractHigh-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.en_US
dc.language.isoengen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.subjectHigh-utility sequential pattern miningen_US
dc.subjectMapReduceen_US
dc.subjectLarge-scaleen_US
dc.subjectParallel and distributeden_US
dc.titleScalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Modelen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber26en_US
dc.source.volume16en_US
dc.source.journalACM Transactions on Knowledge Discovery from Dataen_US
dc.source.issue3en_US
dc.identifier.doi10.1145/3487046
dc.identifier.cristin1971975
dc.source.articlenumber60en_US
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel