Master Thesis: Continuous and periodic data processing of large scale traffic datasets
- Art der Anzeige: Angebot (Biete)
- Jobkategorie: Praktikum
- Arbeitsverhältnis: Teilzeit
- Jobgebiet: Forschung
- Gehalt (brutto): EUR 3.141,- auf Vollzeitbasis gemäß FKV, Einstufung D1
- Stadt: Salzburg
- Postleitzahl: 5020
- Bundesland: Salzburg
- Land: Österreich
- Aufgegeben: 19. November 2024 7:49
- Läuft ab in: 10 tage, 19 Stunden
Beschreibung
Field of study: Computer Science | Duration: 6 months
The Mobility & Transport Analytics research group of Salzburg Research Forschungsgesellschaft processes huge amounts of GNSS position data from vehicles to derive traffic parameters such as speed values or travel times referenced on Austrians official transport graph called the Graph Integration Platform (GIP). Travel times are further stored and aggregated to traffic statistics. The process involves several batch jobs for daily data generation, periodical export into a central data storage and finally the data processing task calculating the data sets on the GIP data model for the relevant time period using the cluster processing framework Apache Spark.
During the last years, stream processing frameworks like Apache Flink gained a large momentum. This thesis aims at an evaluation, if the existing process could be optimized or even replaced using patterns, tools, and frameworks from the stream processing domain instead of the currently used cluster processing.
The envisaged evaluation should answer the following questions:
– Desk research and comparison of open-source frameworks suitable for the hybrid task of continuous data processing with periodical (batch-style) tasks.
– Preferable data structures and storages to use with the frameworks to derive possible required changes in existing data storage.
– Usage and deployment complexity when used with the described varying workloads (e.g. autoscaling) in an on-premises environment (preferable Kubernetes-based).
– Performance and economic considerations in terms of resource requirements of the redesigned data processing pipeline compared to the existing system.
Foto Copyright: Salzburg Research / Shutterstock