Sažetak
The machine learning-based approaches for analysing the mobility needs of users are currently the most prevalent approach in the mobility-on-demand (MoD) analysis. Their efficiency relies on the comprehensiveness and consistency of training datasets. However, this is also the biggest challenge, as high-quality training data are often difficult to obtain. Thus, the Variational Autoencoders (VAE) are investigated as potential generators of synthetic samples for the augmentation of MoD-based datasets. This MoD-based dataset is created using real-world taxi trip data recorded in the Manhattan district of New York City, USA. This augmentation by synthetic samples can potentially enable larger, balanced, and more consistent datasets for machine learning analysis of MoD-based data. The proposed VAE approaches are compared with common dimensionality reduction techniques and standard autoencoders concerning their efficiency in 2-dimensional clustering based on collected MoD-based data. The proposed 2-dimensional convolution VAE framework has achieved clustering results comparable with the other analysed approaches. Thus, it generates synthetic samples, known as “deepfakes”. They are added in different percentages to the initial dataset based on real-world MoD-based data. Thus, this creates augmented datasets of the initial one. The models for predicting the cluster of each sample are used to evaluate the impact of those augmented datasets on their accuracy and learning convergence compared to the initial dataset. Results have shown that the accuracy and learning convergence are improved if those predictive models are trained on an augmented dataset which includes up to 10% of synthetic samples for each cluster.
Ključne riječi
variational autoencoders; mobility on demand; training datasets augmentation; dimensional reduction; intelligent transportation systems