Project Files

bonus
src
tests
Makefile
README.md
requirements.txt
Project screenshot

TARDIS

Project Overview

TARDIS analyzes train trajectory data to predict delays for specific routes and dates. This application processes SNCF's historical delay data through a robust cleaning pipeline, trains multiple machine learning models, and delivers predictions via an intuitive Streamlit interface.

Key Features

  • Data Cleaning Pipeline: Corrects decimal values, standardizes station names, and handles missing data
  • Multi-Model Approach: Compares Linear Regression, Gradient Boosting, Random Forest, and Neural Networks
  • Interactive Predictions: Get delay forecasts for specific routes and dates
  • Visual Analytics: Heatmaps, station comparisons, and trend visualizations

Technologies

| Component | Tools | |-----------|-------| | Frontend | Streamlit | | Backend | Python 3.9+ | | Data Processing | Pandas, NumPy | | Visualization | Matplotlib, Seaborn | | Machine Learning | Scikit-learn, TensorFlow | | Code Quality | Ruff Formatter |

Data Cleaning

The tardis_eda.ipynb notebook performs:

  • Decimal Correction: Groups stations to normalize numerical values
  • Text Standardization: Fixes typos in:
    • Departure station
    • Arrival station
    • Date columns
  • Feature Engineering: Creates consistent temporal features
  • Data Reduction: Removes non-essential columns while preserving key metrics

Frontend

  1. Route Prediction Tool

    • Select departure/arrival stations
    • Choose travel date
    • Receive delay prediction
  2. Historical Analysis

    • Monthly delay heatmaps
    • Station performance comparisons
    • Cancellation rate trends
  3. Model Insights

    • Variable importance charts
    • Prediction confidence intervals
Tardis | Théo CREPIN