
TARDIS
Project Overview
TARDIS analyzes train trajectory data to predict delays for specific routes and dates. This application processes SNCF's historical delay data through a robust cleaning pipeline, trains multiple machine learning models, and delivers predictions via an intuitive Streamlit interface.
Key Features
- Data Cleaning Pipeline: Corrects decimal values, standardizes station names, and handles missing data
- Multi-Model Approach: Compares Linear Regression, Gradient Boosting, Random Forest, and Neural Networks
- Interactive Predictions: Get delay forecasts for specific routes and dates
- Visual Analytics: Heatmaps, station comparisons, and trend visualizations
Technologies
| Component | Tools | |-----------|-------| | Frontend | Streamlit | | Backend | Python 3.9+ | | Data Processing | Pandas, NumPy | | Visualization | Matplotlib, Seaborn | | Machine Learning | Scikit-learn, TensorFlow | | Code Quality | Ruff Formatter |
Data Cleaning
The tardis_eda.ipynb notebook performs:
- Decimal Correction: Groups stations to normalize numerical values
- Text Standardization: Fixes typos in:
Departure stationArrival stationDatecolumns
- Feature Engineering: Creates consistent temporal features
- Data Reduction: Removes non-essential columns while preserving key metrics
Frontend
-
Route Prediction Tool
- Select departure/arrival stations
- Choose travel date
- Receive delay prediction
-
Historical Analysis
- Monthly delay heatmaps
- Station performance comparisons
- Cancellation rate trends
-
Model Insights
- Variable importance charts
- Prediction confidence intervals