Transport Mode Detection (TMD) using GPS trajectories is a fundamental task in mobility analytics, covering transport planning and accessibility analysis to environmental assessment and public health analysis. But real-world GPS datasets are usually noisy, irregularly sampled, and largely unlabeled. It therefore presents a challenge when it comes to separating overlapping modes like cars, buses, and cycling, as well as scaling up supervised models. Previous work by Sadeghian et al. proposed a stepwise approach based on rule-based walk detection, clustering, and GIS validation to retrieve high-confidence labels from unlabeled GPS data. Though useful, this framework is based on hand-crafted features and is limited to classical clustering and thus does not leverage recent advancements in transformer-based sequence modelling.
As the stepwise approach proposed in this thesis is developed as a hybrid work, integrating transformer-based representation learning via a self-supervised SegmentBERT encoder for TMD on a Swedish GPS dataset. As a start, GPS points are cleaned, split into trips, and chunked into fixed-length segments. Speed and distance rules are used to identify walk and bike segments. Next, the SegmentBERT encoder is pretrained in a self-supervised manner on kinematic feature sequences (speed, acceleration, jerk, bearing dynamics, stop indicators), learning generic, non-discriminative segment-level embeddings without manual labels. The embeddings are clustered using multiple algorithms (HDBSCAN, Gaussian Mixture Models, Spectral, Agglomerative & Birch clustering), and silhouette scores and coverage are obtained. Then, use GIS layers in QGIS to validate high-confidence clusters for car, bus, and train modes, yielding a reference label set that balances quality and coverage of the labeled data.
For this GIS-validated subset, classical supervised models are compared with hybrid and transformer-based classifiers based on SegmentBERT embeddings. The results show that embedding-based clustering can achieve high coverage for non-walk segments, and hybrid models paired with the embeddings can achieve competitive accuracy and F1 score compared to classical baselines while improving performance for minority modes such as bus and train.
The combination of a modern representation-learning approach with a transparent, stepwise pipeline, as shown by this work, provides a scalable, interpretable solution for TMD from unlabeled GPS trajectories overall.