Development of Tools for Creating Parallel Data Mining Algorithms
DOI:
https://doi.org/10.37934/araset.39.1.2642Keywords:
Data mining, classification, clustering, association rules, parallel algorithmsAbstract
The purpose of this work is to develop tools for building parallel data mining algorithms for execution in a distributed environment. A formal model of a data mining algorithm is proposed, characterized by a representation of the algorithm in the form of a set of independent operations that change the state of the knowledge model and structural blocks that allow modifying the structure of the algorithm, including for parallel execution. A method is proposed for creating parallel algorithms for data mining, in contrast to existing ones, using a decomposition of the algorithm into thread-safe functional blocks and allowing parallelization, both by changing the structure of the parallel algorithm and by configuring its execution. A methodology is proposed for parallelizing data mining algorithms, which differs from those known in that the proposed method of creating parallel data mining algorithms taking into account the characteristics of a distributed environment is applied to sequential analysis algorithms. To create parallel data mining algorithms, software templates built on the basis of a formal model and separating the implementation of the algorithm from distributed execution tools are proposed. A library of parallel data mining algorithms has been developed for execution in a distributed environment, including the proposed templates.