Sažetak
This research paper explores various storage types suitable for handling large datasets within machine learning projects. The study focuses on a comparison of performance metrics, specifically focusing on the read operations from storage employing textual files, relational databases, and NoSQL databases. In the realm of machine learning projects, the significance of efficiently reading extensive datasets is very important. This research seeks to describe alternative approaches to the varying sizes of datasets, recognizing that the time required to train machine learning models is related to the efficiency of data retrieval. The methodology employed for this comparative analysis encompasses the utilization of datasets of different sizes, measurement of read operation durations, iterative measurements for precision, and computation of average read operation durations. The results show that the storage type decisions need to depend on a specific size of the dataset, as well as certain characteristics, thereby optimizing the length of the training process for machine learning models. By aligning storage choices with dataset dimensions, the results of this research help to make a better decision related to choosing the right dataset storage type, thereby contributing to the overall efficiency and efficacy of machine learning systems.
Ključne riječi
machine learning, datasets, storage types, performance, files, databases, NoSQL