Back to Projects

Project Overview

Numerical Weather Prediction (NWP) models consistently over- or under-predict temperature at local scales due to terrain limitations, grid resolution, and boundary parameterization errors. This project trains an LSTM to learn and correct these systematic biases — published at IEEE ICRITO 2025.

0.9408R² Score
13%Accuracy Gain
65%Bias Reduction
1.2°CRMSE Improvement

Why NWP Models Have Bias

NWP models divide the atmosphere into a 3D grid. Terrain features smaller than the grid cell — valleys, coastal cliffs, urban heat islands — are averaged out. This causes systematic, predictable errors: a model might consistently over-predict temperature in a coastal city by 2–3°C every morning due to imprecise sea-surface temperature coupling.

Because the bias is predictable and repeatable, it is an ideal regression target for machine learning.

Data Pipeline

Python
import pandas as pd, numpy as np
from sklearn.preprocessing import StandardScaler

nwp = pd.read_csv('nwp_forecasts.csv', parse_dates=['time'])
obs = pd.read_csv('station_obs.csv',    parse_dates=['time'])
merged = nwp.merge(obs, on='time')
merged['bias'] = merged['temp_forecast'] - merged['temp_observed']

# Cyclical time encoding (avoids hour 23 ↔ hour 0 distance problem)
merged['hour_sin'] = np.sin(2*np.pi*merged['time'].dt.hour/24)
merged['hour_cos'] = np.cos(2*np.pi*merged['time'].dt.hour/24)
merged['doy_sin']  = np.sin(2*np.pi*merged['time'].dt.dayofyear/365)
merged['doy_cos']  = np.cos(2*np.pi*merged['time'].dt.dayofyear/365)

features = ['temp_forecast','humidity_nwp','pressure_nwp',
            'wind_speed_nwp','hour_sin','hour_cos','doy_sin','doy_cos']
scaler = StandardScaler()
X = scaler.fit_transform(merged[features])
y = merged['bias'].values    # regression target: bias in °C

LSTM Model

Python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

LOOK_BACK = 72  # 72-hour look-back window
model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(LOOK_BACK, len(features))),
    Dropout(0.2),
    LSTM(32, return_sequences=False),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1)   # predict bias value
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.fit(X_train_seq, y_train, epochs=80, batch_size=32,
          validation_split=0.15)

# Apply correction at inference time
predicted_bias = model.predict(X_test_seq).flatten()
corrected_temp = nwp_test_forecast - predicted_bias

Results vs Baselines

  • Raw NWP: R² = 0.81, RMSE = 2.8°C
  • Climatological mean correction: R² = 0.87, RMSE = 2.1°C
  • Linear regression: R² = 0.90, RMSE = 1.9°C
  • Our LSTM (72h window): R² = 0.9408, RMSE = 1.6°C ✅

IEEE Paper: DOI 10.1109/ICRITO66076.2025.11241393 — ICRITO 2025, IEEE Xplore.

LSTMTensorFlowNWP IEEE ICRITO 2025Time SeriesPython