Skip to content

ML Package

ml

ML — F1 race prediction service.

Provides model training, feature engineering, and inference for predicting F1 race finishing positions.

Subpackages

db: Database connection and CRUD operations. models: SQLAlchemy ORM models and Pydantic schemas. routers: FastAPI endpoint definitions. services: Inference service for generating predictions. training: Model training pipeline and feature engineering.

config

ML service configuration.

All settings are loaded from environment variables with sensible defaults. Variables are prefixed with F1BOARD_ML_ for service-specific values or F1BOARD_ for project-wide values.

db

Database layer for the ML service.

connection

Async database engine and session management.

get_db async
get_db()

Yield an async database session for dependency injection.

init_db async
init_db()

Create all tables defined in the ORM metadata.

close_db async
close_db()

Dispose of the database engine and release connections.

crud

CRUD helpers for the ML service database.

Every public function accepts an AsyncSession as its first argument and performs a single, focused database operation.

get_driver async
get_driver(db: AsyncSession, driver_id: str) -> models.Driver | None

Fetch a driver by their string identifier.

get_driver_by_id async
get_driver_by_id(db: AsyncSession, id: int) -> models.Driver | None

Fetch a driver by primary key.

get_all_drivers async
get_all_drivers(db: AsyncSession) -> list[models.Driver]

Return all drivers.

create_driver async
create_driver(db: AsyncSession, driver: DriverCreate) -> models.Driver

Insert a new driver row.

get_or_create_driver async
get_or_create_driver(db: AsyncSession, driver: DriverCreate) -> models.Driver

Return an existing driver or create a new one.

get_constructor async
get_constructor(db: AsyncSession, constructor_id: str) -> models.Constructor | None

Fetch a constructor by its string identifier.

create_constructor async
create_constructor(db: AsyncSession, constructor: ConstructorCreate) -> models.Constructor

Insert a new constructor row.

get_or_create_constructor async
get_or_create_constructor(db: AsyncSession, constructor: ConstructorCreate) -> models.Constructor

Return an existing constructor or create a new one.

get_race async
get_race(db: AsyncSession, season: int, round: int) -> models.Race | None

Fetch a race by season and round number.

get_all_races async
get_all_races(db: AsyncSession) -> list[models.Race]

Return all races ordered by season and round.

get_upcoming_race async
get_upcoming_race(db: AsyncSession, target_date: date) -> models.Race | None

Return the next race after target_date.

create_race async
create_race(db: AsyncSession, race: RaceCreate) -> models.Race

Insert a new race row.

get_or_create_race async
get_or_create_race(db: AsyncSession, race: RaceCreate) -> models.Race

Return an existing race or create a new one.

create_race_result async
create_race_result(db: AsyncSession, result: RaceResultCreate) -> models.RaceResult

Insert a new race result row.

get_race_results async
get_race_results(db: AsyncSession, race_id: int) -> list[models.RaceResult]

Return all results for a race, eager-loading driver and constructor.

get_driver_results async
get_driver_results(db: AsyncSession, driver_id: int) -> list[models.RaceResult]

Return all race results for a specific driver.

get_all_results async
get_all_results(db: AsyncSession) -> list[models.RaceResult]

Return every race result with related race, driver, and constructor.

create_prediction async
create_prediction(db: AsyncSession, prediction: PredictionCreate) -> models.Prediction

Insert a single prediction row.

get_predictions_for_race async
get_predictions_for_race(db: AsyncSession, race_id: int) -> list[models.Prediction]

Return all predictions for a race ordered by position.

replace_predictions_for_race async
replace_predictions_for_race(db: AsyncSession, race_id: int, predictions: list[PredictionCreate]) -> list[models.Prediction]

Delete existing predictions for a race and insert new ones.

get_stats async
get_stats(db: AsyncSession) -> dict

Return aggregate statistics (driver/race/result counts, seasons).

main

FastAPI application entry point for the ML service.

Configures the CORS middleware, registers routers, and manages the application lifespan (database initialisation, model loading).

lifespan async

lifespan(app: FastAPI)

Manage application startup and shutdown.

On startup the database tables are created and the trained ML model is loaded into memory. On shutdown the database engine is disposed.

root async

root()

Return service metadata.

models

ORM models and Pydantic schemas for the ML service.

database

SQLAlchemy models used by the ML service.

Prediction

Bases: Base

AI-generated podium prediction for a race.

schemas

Pydantic schemas for request/response validation in the ML service.

DriverBase

Bases: BaseModel

Shared driver fields.

ConstructorBase

Bases: BaseModel

Shared constructor fields.

RaceBase

Bases: BaseModel

Shared race fields.

RaceResultBase

Bases: BaseModel

Shared race-result fields.

PredictionBase

Bases: BaseModel

Shared prediction fields.

PredictionRequest

Bases: BaseModel

Client request to generate predictions for a race.

DriverPrediction

Bases: BaseModel

A single driver's predicted finish.

PredictionResponse

Bases: BaseModel

Full prediction response containing multiple driver predictions.

AIPodiumEntry

Bases: BaseModel

Single podium entry in an AI prediction.

AIPodiumResponse

Bases: BaseModel

AI-predicted podium for a race.

HealthResponse

Bases: BaseModel

Health-check response schema.

StatsResponse

Bases: BaseModel

Aggregate database statistics.

SyncRequest

Bases: BaseModel

Data sync request specifying season range.

SyncResponse

Bases: BaseModel

Data sync result summary.

routers

FastAPI routers for the ML service.

health

Health and readiness endpoints for the ML service.

health_check async
health_check(db: AsyncSession = Depends(get_db))

Return service health including DB connectivity and model status.

readiness async
readiness()

Lightweight readiness probe.

predictions

Prediction endpoints for the ML service.

Provides routes to generate, retrieve, and cache AI race-finish predictions.

predict_race async
predict_race(request: PredictionRequest, db: AsyncSession = Depends(get_db))

Generate finish-position predictions for the requested drivers.

get_or_create_race_podium async
get_or_create_race_podium(season: int, round: int, db: AsyncSession = Depends(get_db))

Return the AI podium for a race, generating it if not yet stored.

get_or_create_current_race_podium async
get_or_create_current_race_podium(db: AsyncSession = Depends(get_db))

Return the AI podium for the next upcoming race.

get_or_create_current_race_podium_alias async
get_or_create_current_race_podium_alias(db: AsyncSession = Depends(get_db))

Backward-compatible alias for current race podium predictions.

get_predictions async
get_predictions(season: int, round: int, db: AsyncSession = Depends(get_db))

Fetch stored predictions for a specific race.

services

Business-logic services for the ML service.

inference

ML model loading and race prediction service.

InferenceService

Loads a trained model and generates race-finish predictions.

load_model
load_model() -> bool

Load the trained model and feature engineer from disk.

Returns:

Type Description
bool

True if the model was loaded successfully, False otherwise.

predict_race async
predict_race(db: AsyncSession, season: int, round: int, driver_ids: list[str]) -> list[dict]

Predict race results for the specified list of driver IDs.

predict_and_store_race async
predict_and_store_race(db: AsyncSession, season: int, round: int) -> list[dict]

Predict results for all known drivers and persist them.

training

Model training pipeline, feature engineering, and predictor.

features

Feature engineering for the F1 race prediction model.

Computes per-driver, per-constructor, and per-circuit aggregate statistics from historical race results.

FeatureEngineer

Compute and store aggregate statistics for ML features.

Call :meth:fit on a training DataFrame, then :meth:transform to produce the feature matrix. At inference time use :meth:get_prediction_features.

fit
fit(df: DataFrame)

Compute aggregate statistics from historical results.

transform
transform(df: DataFrame) -> np.ndarray

Transform a results DataFrame into a feature matrix.

get_feature_names
get_feature_names() -> list[str]

Return human-readable feature column names.

get_prediction_features async
get_prediction_features(db: AsyncSession, driver_id: int, race_id: int) -> list | None

Build a feature vector for a single driver/race pair at inference time.

model

Gradient-boosting classifier wrapper for F1 finish-position prediction.

RacePredictor

Wraps a GradientBoostingClassifier to predict race finishing positions.

Positions are capped to 1-20 and encoded via LabelEncoder.

fit
fit(X: ndarray, y: ndarray)

Train the classifier on features X and labels y.

predict
predict(X: ndarray) -> np.ndarray

Predict finishing positions for feature matrix X.

predict_proba
predict_proba(X: ndarray) -> np.ndarray

Return class probability estimates for X.

score
score(X: ndarray, y: ndarray) -> float

Return accuracy score on X / y.

train

Model training pipeline.

Loads historical race results from the database, engineers features, trains a RacePredictor model, and serialises the artefacts to disk.

load_training_data async
load_training_data(start_year: int | None = None, end_year: int | None = None) -> pd.DataFrame

Query race results from the database and return them as a DataFrame.

train_model
train_model(df: DataFrame) -> tuple[RacePredictor, FeatureEngineer]

Train a RacePredictor on the provided results DataFrame.

save_model
save_model(model: RacePredictor, feature_engineer: FeatureEngineer)

Serialise the trained model and feature engineer to disk.

main async
main(start_year: int | None = None, end_year: int | None = None)

End-to-end training entrypoint: load data, train, and save.

Configuration

ml.config

ML service configuration.

All settings are loaded from environment variables with sensible defaults. Variables are prefixed with F1BOARD_ML_ for service-specific values or F1BOARD_ for project-wide values.

Database

Connection

ml.db.connection

Async database engine and session management.

get_db async

get_db()

Yield an async database session for dependency injection.

init_db async

init_db()

Create all tables defined in the ORM metadata.

close_db async

close_db()

Dispose of the database engine and release connections.

CRUD

ml.db.crud

CRUD helpers for the ML service database.

Every public function accepts an AsyncSession as its first argument and performs a single, focused database operation.

get_driver async

get_driver(db: AsyncSession, driver_id: str) -> models.Driver | None

Fetch a driver by their string identifier.

get_driver_by_id async

get_driver_by_id(db: AsyncSession, id: int) -> models.Driver | None

Fetch a driver by primary key.

get_all_drivers async

get_all_drivers(db: AsyncSession) -> list[models.Driver]

Return all drivers.

create_driver async

create_driver(db: AsyncSession, driver: DriverCreate) -> models.Driver

Insert a new driver row.

get_or_create_driver async

get_or_create_driver(db: AsyncSession, driver: DriverCreate) -> models.Driver

Return an existing driver or create a new one.

get_constructor async

get_constructor(db: AsyncSession, constructor_id: str) -> models.Constructor | None

Fetch a constructor by its string identifier.

create_constructor async

create_constructor(db: AsyncSession, constructor: ConstructorCreate) -> models.Constructor

Insert a new constructor row.

get_or_create_constructor async

get_or_create_constructor(db: AsyncSession, constructor: ConstructorCreate) -> models.Constructor

Return an existing constructor or create a new one.

get_race async

get_race(db: AsyncSession, season: int, round: int) -> models.Race | None

Fetch a race by season and round number.

get_all_races async

get_all_races(db: AsyncSession) -> list[models.Race]

Return all races ordered by season and round.

get_upcoming_race async

get_upcoming_race(db: AsyncSession, target_date: date) -> models.Race | None

Return the next race after target_date.

create_race async

create_race(db: AsyncSession, race: RaceCreate) -> models.Race

Insert a new race row.

get_or_create_race async

get_or_create_race(db: AsyncSession, race: RaceCreate) -> models.Race

Return an existing race or create a new one.

create_race_result async

create_race_result(db: AsyncSession, result: RaceResultCreate) -> models.RaceResult

Insert a new race result row.

get_race_results async

get_race_results(db: AsyncSession, race_id: int) -> list[models.RaceResult]

Return all results for a race, eager-loading driver and constructor.

get_driver_results async

get_driver_results(db: AsyncSession, driver_id: int) -> list[models.RaceResult]

Return all race results for a specific driver.

get_all_results async

get_all_results(db: AsyncSession) -> list[models.RaceResult]

Return every race result with related race, driver, and constructor.

create_prediction async

create_prediction(db: AsyncSession, prediction: PredictionCreate) -> models.Prediction

Insert a single prediction row.

get_predictions_for_race async

get_predictions_for_race(db: AsyncSession, race_id: int) -> list[models.Prediction]

Return all predictions for a race ordered by position.

replace_predictions_for_race async

replace_predictions_for_race(db: AsyncSession, race_id: int, predictions: list[PredictionCreate]) -> list[models.Prediction]

Delete existing predictions for a race and insert new ones.

get_stats async

get_stats(db: AsyncSession) -> dict

Return aggregate statistics (driver/race/result counts, seasons).

Models

ORM Models

ml.models.database

SQLAlchemy models used by the ML service.

Prediction

Bases: Base

AI-generated podium prediction for a race.

Schemas

ml.models.schemas

Pydantic schemas for request/response validation in the ML service.

DriverBase

Bases: BaseModel

Shared driver fields.

ConstructorBase

Bases: BaseModel

Shared constructor fields.

RaceBase

Bases: BaseModel

Shared race fields.

RaceResultBase

Bases: BaseModel

Shared race-result fields.

PredictionBase

Bases: BaseModel

Shared prediction fields.

PredictionRequest

Bases: BaseModel

Client request to generate predictions for a race.

DriverPrediction

Bases: BaseModel

A single driver's predicted finish.

PredictionResponse

Bases: BaseModel

Full prediction response containing multiple driver predictions.

AIPodiumEntry

Bases: BaseModel

Single podium entry in an AI prediction.

AIPodiumResponse

Bases: BaseModel

AI-predicted podium for a race.

HealthResponse

Bases: BaseModel

Health-check response schema.

StatsResponse

Bases: BaseModel

Aggregate database statistics.

SyncRequest

Bases: BaseModel

Data sync request specifying season range.

SyncResponse

Bases: BaseModel

Data sync result summary.

Routers

Health

ml.routers.health

Health and readiness endpoints for the ML service.

health_check async

health_check(db: AsyncSession = Depends(get_db))

Return service health including DB connectivity and model status.

readiness async

readiness()

Lightweight readiness probe.

Predictions

ml.routers.predictions

Prediction endpoints for the ML service.

Provides routes to generate, retrieve, and cache AI race-finish predictions.

predict_race async

predict_race(request: PredictionRequest, db: AsyncSession = Depends(get_db))

Generate finish-position predictions for the requested drivers.

get_or_create_race_podium async

get_or_create_race_podium(season: int, round: int, db: AsyncSession = Depends(get_db))

Return the AI podium for a race, generating it if not yet stored.

get_or_create_current_race_podium async

get_or_create_current_race_podium(db: AsyncSession = Depends(get_db))

Return the AI podium for the next upcoming race.

get_or_create_current_race_podium_alias async

get_or_create_current_race_podium_alias(db: AsyncSession = Depends(get_db))

Backward-compatible alias for current race podium predictions.

get_predictions async

get_predictions(season: int, round: int, db: AsyncSession = Depends(get_db))

Fetch stored predictions for a specific race.

Services

Inference

ml.services.inference

ML model loading and race prediction service.

InferenceService

Loads a trained model and generates race-finish predictions.

load_model

load_model() -> bool

Load the trained model and feature engineer from disk.

Returns:

Type Description
bool

True if the model was loaded successfully, False otherwise.

predict_race async

predict_race(db: AsyncSession, season: int, round: int, driver_ids: list[str]) -> list[dict]

Predict race results for the specified list of driver IDs.

predict_and_store_race async

predict_and_store_race(db: AsyncSession, season: int, round: int) -> list[dict]

Predict results for all known drivers and persist them.

Training

Feature Engineering

ml.training.features

Feature engineering for the F1 race prediction model.

Computes per-driver, per-constructor, and per-circuit aggregate statistics from historical race results.

FeatureEngineer

Compute and store aggregate statistics for ML features.

Call :meth:fit on a training DataFrame, then :meth:transform to produce the feature matrix. At inference time use :meth:get_prediction_features.

fit

fit(df: DataFrame)

Compute aggregate statistics from historical results.

transform

transform(df: DataFrame) -> np.ndarray

Transform a results DataFrame into a feature matrix.

get_feature_names

get_feature_names() -> list[str]

Return human-readable feature column names.

get_prediction_features async

get_prediction_features(db: AsyncSession, driver_id: int, race_id: int) -> list | None

Build a feature vector for a single driver/race pair at inference time.

Model

ml.training.model

Gradient-boosting classifier wrapper for F1 finish-position prediction.

RacePredictor

Wraps a GradientBoostingClassifier to predict race finishing positions.

Positions are capped to 1-20 and encoded via LabelEncoder.

fit

fit(X: ndarray, y: ndarray)

Train the classifier on features X and labels y.

predict

predict(X: ndarray) -> np.ndarray

Predict finishing positions for feature matrix X.

predict_proba

predict_proba(X: ndarray) -> np.ndarray

Return class probability estimates for X.

score

score(X: ndarray, y: ndarray) -> float

Return accuracy score on X / y.

Training Pipeline

ml.training.train

Model training pipeline.

Loads historical race results from the database, engineers features, trains a RacePredictor model, and serialises the artefacts to disk.

load_training_data async

load_training_data(start_year: int | None = None, end_year: int | None = None) -> pd.DataFrame

Query race results from the database and return them as a DataFrame.

train_model

train_model(df: DataFrame) -> tuple[RacePredictor, FeatureEngineer]

Train a RacePredictor on the provided results DataFrame.

save_model

save_model(model: RacePredictor, feature_engineer: FeatureEngineer)

Serialise the trained model and feature engineer to disk.

main async

main(start_year: int | None = None, end_year: int | None = None)

End-to-end training entrypoint: load data, train, and save.