Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

izvorni znanstveni rad

Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

Aleksander Radovan

Vrsta prilog sa skupa (u zborniku)

Tip izvorni znanstveni rad

Godina 2024

Nadređena publikacija MIPRO 2024 : 47th ICT and Electronics Convention : mipro proceedings

Stranice 9178, 6

ISSN 1847-3946

Status objavljeno

Sažetak

This research paper explores various storage types suitable for handling large datasets within machine learning projects. The study focuses on a comparison of performance metrics, specifically focusing on the read operations from storage employing textual files, relational databases, and NoSQL databases. In the realm of machine learning projects, the significance of efficiently reading extensive datasets is very important. This research seeks to describe alternative approaches to the varying sizes of datasets, recognizing that the time required to train machine learning models is related to the efficiency of data retrieval. The methodology employed for this comparative analysis encompasses the utilization of datasets of different sizes, measurement of read operation durations, iterative measurements for precision, and computation of average read operation durations. The results show that the storage type decisions need to depend on a specific size of the dataset, as well as certain characteristics, thereby optimizing the length of the training process for machine learning models. By aligning storage choices with dataset dimensions, the results of this research help to make a better decision related to choosing the right dataset storage type, thereby contributing to the overall efficiency and efficacy of machine learning systems.

Ključne riječi

machine learning, datasets, storage types, performance, files, databases, NoSQL

Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

Optimizing Machine Learning Training: A Comparative Study of Storage Types for Efficient Large Dataset Processing

Sažetak

Ključne riječi

Ostale publikacije

Automatizacija dostupnosti virtualnog okruženja pomoću PowerCLI

Usporedba performansi i implementaijce prosljeđivanja grafičkih kartica na hyper-v i kvm hipervizorima

SMB Over QUIC: A Performance Evaluation