Towards Synthetic Augmentation of Training Datasets Generated by Mobility-on-Demand Service Using Deep Variational Autoencoders

izvorni znanstveni rad

Towards Synthetic Augmentation of Training Datasets Generated by Mobility-on-Demand Service Using Deep Variational Autoencoders

Edouard Ivanjko

Vrsta prilog u časopisu

Tip izvorni znanstveni rad

Godina 2025

Časopis Applied sciences (Basel)

Volumen 15

Svesčić 9

Stranice 4708, 28

DOI 10.3390/app15094708

EISSN 2076-3417

Status objavljeno

Sažetak

The machine learning-based approaches for analysing the mobility needs of users are currently the most prevalent approach in the mobility-on-demand (MoD) analysis. Their efficiency relies on the comprehensiveness and consistency of training datasets. However, this is also the biggest challenge, as high-quality training data are often difficult to obtain. Thus, the Variational Autoencoders (VAE) are investigated as potential generators of synthetic samples for the augmentation of MoD-based datasets. This MoD-based dataset is created using real-world taxi trip data recorded in the Manhattan district of New York City, USA. This augmentation by synthetic samples can potentially enable larger, balanced, and more consistent datasets for machine learning analysis of MoD-based data. The proposed VAE approaches are compared with common dimensionality reduction techniques and standard autoencoders concerning their efficiency in 2-dimensional clustering based on collected MoD-based data. The proposed 2-dimensional convolution VAE framework has achieved clustering results comparable with the other analysed approaches. Thus, it generates synthetic samples, known as “deepfakes”. They are added in different percentages to the initial dataset based on real-world MoD-based data. Thus, this creates augmented datasets of the initial one. The models for predicting the cluster of each sample are used to evaluate the impact of those augmented datasets on their accuracy and learning convergence compared to the initial dataset. Results have shown that the accuracy and learning convergence are improved if those predictive models are trained on an augmented dataset which includes up to 10% of synthetic samples for each cluster.

Ključne riječi

variational autoencoders; mobility on demand; training datasets augmentation; dimensional reduction; intelligent transportation systems

Towards Synthetic Augmentation of Training Datasets Generated by Mobility-on-Demand Service Using Deep Variational Autoencoders

Towards Synthetic Augmentation of Training Datasets Generated by Mobility-on-Demand Service Using Deep Variational Autoencoders

Sažetak

Ključne riječi

Ostale publikacije

Automatizacija dostupnosti virtualnog okruženja pomoću PowerCLI

Usporedba performansi i implementaijce prosljeđivanja grafičkih kartica na hyper-v i kvm hipervizorima

SMB Over QUIC: A Performance Evaluation