Project Overview
Numerical Weather Prediction (NWP) models consistently over- or under-predict temperature at local scales due to terrain limitations, grid resolution, and boundary parameterization errors. This project trains an LSTM to learn and correct these systematic biases — published at IEEE ICRITO 2025.
Why NWP Models Have Bias
NWP models divide the atmosphere into a 3D grid. Terrain features smaller than the grid cell — valleys, coastal cliffs, urban heat islands — are averaged out. This causes systematic, predictable errors: a model might consistently over-predict temperature in a coastal city by 2–3°C every morning due to imprecise sea-surface temperature coupling.
Because the bias is predictable and repeatable, it is an ideal regression target for machine learning.
Data Pipeline
import pandas as pd, numpy as np
from sklearn.preprocessing import StandardScaler
nwp = pd.read_csv('nwp_forecasts.csv', parse_dates=['time'])
obs = pd.read_csv('station_obs.csv', parse_dates=['time'])
merged = nwp.merge(obs, on='time')
merged['bias'] = merged['temp_forecast'] - merged['temp_observed']
# Cyclical time encoding (avoids hour 23 ↔ hour 0 distance problem)
merged['hour_sin'] = np.sin(2*np.pi*merged['time'].dt.hour/24)
merged['hour_cos'] = np.cos(2*np.pi*merged['time'].dt.hour/24)
merged['doy_sin'] = np.sin(2*np.pi*merged['time'].dt.dayofyear/365)
merged['doy_cos'] = np.cos(2*np.pi*merged['time'].dt.dayofyear/365)
features = ['temp_forecast','humidity_nwp','pressure_nwp',
'wind_speed_nwp','hour_sin','hour_cos','doy_sin','doy_cos']
scaler = StandardScaler()
X = scaler.fit_transform(merged[features])
y = merged['bias'].values # regression target: bias in °CLSTM Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
LOOK_BACK = 72 # 72-hour look-back window
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(LOOK_BACK, len(features))),
Dropout(0.2),
LSTM(32, return_sequences=False),
Dropout(0.2),
Dense(16, activation='relu'),
Dense(1) # predict bias value
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.fit(X_train_seq, y_train, epochs=80, batch_size=32,
validation_split=0.15)
# Apply correction at inference time
predicted_bias = model.predict(X_test_seq).flatten()
corrected_temp = nwp_test_forecast - predicted_biasResults vs Baselines
- Raw NWP: R² = 0.81, RMSE = 2.8°C
- Climatological mean correction: R² = 0.87, RMSE = 2.1°C
- Linear regression: R² = 0.90, RMSE = 1.9°C
- Our LSTM (72h window): R² = 0.9408, RMSE = 1.6°C ✅
IEEE Paper: DOI 10.1109/ICRITO66076.2025.11241393 — ICRITO 2025, IEEE Xplore.