Vis enkel innførsel

dc.contributor.authorSukhobok, Dina
dc.contributor.authorNikolov, Nikolay
dc.contributor.authorRoman, Dumitru
dc.date.accessioned2018-03-22T06:38:25Z
dc.date.available2018-03-22T06:38:25Z
dc.date.created2018-03-19T22:32:57Z
dc.date.issued2017
dc.identifier.citation2017 International Conference on Big Data Innovations and Applications (Innovate-Data), Prague, Czech Republic, Czech Republic, 21-23 Aug. 2017, 25-34nb_NO
dc.identifier.isbn978-1-5386-0960-6
dc.identifier.urihttp://hdl.handle.net/11250/2491583
dc.description.abstractOne essential and challenging task in data science is data cleaning - the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer - a software framework that implements those data operationsnb_NO
dc.language.isoengnb_NO
dc.relation.ispartof2017 International Conference on Big Data Innovations and Applications (Innovate-Data), Prague, Czech Republic, Czech Republic, 21-23 Aug. 2017
dc.titleTabular Data Anomaly Patternsnb_NO
dc.typeChapternb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber25-34nb_NO
dc.identifier.cristin1574174
dc.relation.projectEC/H2020/732590nb_NO
dc.relation.projectEC/H2020/732003nb_NO
dc.relation.projectEC/H2020/644497nb_NO
cristin.unitcode7401,90,12,0
cristin.unitnameNettbaserte systemer og tjenester
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel