Development of Tools for Creating Parallel Data Mining Algorithms

Karshiyev Zaynidin; Sattarov Mirzabek

doi:10.37934/araset.39.1.2642

Authors

Karshiyev Zaynidin Department of Computer Systems, Faculty of Computer Engineering, Samarkand Branch of Tashkent University of Information Technologies named after Muhammad al-Kwarizmi , 140100 Samarkand, Samarkand, Uzbekistan
Sattarov Mirzabek Department of Computer Systems, Faculty of Computer Engineering, Samarkand Branch of Tashkent University of Information Technologies named after Muhammad al-Kwarizmi , 140100 Samarkand, Samarkand, Uzbekistan

DOI:

https://doi.org/10.37934/araset.39.1.2642

Keywords:

Data mining, classification, clustering, association rules, parallel algorithms

Abstract

The purpose of this work is to develop tools for building parallel data mining algorithms for execution in a distributed environment. A formal model of a data mining algorithm is proposed, characterized by a representation of the algorithm in the form of a set of independent operations that change the state of the knowledge model and structural blocks that allow modifying the structure of the algorithm, including for parallel execution. A method is proposed for creating parallel algorithms for data mining, in contrast to existing ones, using a decomposition of the algorithm into thread-safe functional blocks and allowing parallelization, both by changing the structure of the parallel algorithm and by configuring its execution. A methodology is proposed for parallelizing data mining algorithms, which differs from those known in that the proposed method of creating parallel data mining algorithms taking into account the characteristics of a distributed environment is applied to sequential analysis algorithms. To create parallel data mining algorithms, software templates built on the basis of a formal model and separating the implementation of the algorithm from distributed execution tools are proposed. A library of parallel data mining algorithms has been developed for execution in a distributed environment, including the proposed templates.