Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

izvorni znanstveni rad

izvorni znanstveni rad

Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

Vrsta prilog sa skupa (u zborniku)
Tip izvorni znanstveni rad
Godina 2024
Nadređena publikacija MIPRO 2024 : 47th ICT and Electronics Convention : mipro proceedings
Stranice 9178, 6
ISSN 1847-3946
Status objavljeno

Sažetak

This research paper explores various storage types suitable for handling large datasets within machine learning projects. The study focuses on a comparison of performance metrics, specifically focusing on the read operations from storage employing textual files, relational databases, and NoSQL databases. In the realm of machine learning projects, the significance of efficiently reading extensive datasets is very important. This research seeks to describe alternative approaches to the varying sizes of datasets, recognizing that the time required to train machine learning models is related to the efficiency of data retrieval. The methodology employed for this comparative analysis encompasses the utilization of datasets of different sizes, measurement of read operation durations, iterative measurements for precision, and computation of average read operation durations. The results show that the storage type decisions need to depend on a specific size of the dataset, as well as certain characteristics, thereby optimizing the length of the training process for machine learning models. By aligning storage choices with dataset dimensions, the results of this research help to make a better decision related to choosing the right dataset storage type, thereby contributing to the overall efficiency and efficacy of machine learning systems.

Ključne riječi

machine learning, datasets, storage types, performance, files, databases, NoSQL