Why LSTMs for Weather Prediction?
Traditional weather forecasting uses physics-based Numerical Weather Prediction (NWP) models simulating the atmosphere with differential equations. They are computationally expensive and produce systematic errors at local scales.
Long Short-Term Memory (LSTM) networks learn complex temporal dependencies directly from data. For a 24-hour weather sequence, LSTM captures both short-term hourly patterns and longer seasonal trends simultaneously.
Related Research: This work connects to our IEEE ICRITO 2025 paper on NWP bias correction. View full citation
How LSTM Gates Work
An LSTM cell solves the vanishing gradient problem of vanilla RNNs using three gates:
- Forget Gate: Outputs 0–1 per cell state value — 0 = discard, 1 = keep.
- Input Gate: Sigmoid decides what to update; tanh creates new candidate values.
- Output Gate: Filters cell state → new hidden state passed to next time step.
Data Preprocessing
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.utils.class_weight import compute_class_weight
df = pd.read_csv('weather_data.csv', parse_dates=['datetime'])
df.set_index('datetime', inplace=True)
df.dropna(inplace=True)
features = ['temperature', 'humidity', 'wind_speed', 'pressure', 'dew_point']
scaler = MinMaxScaler()
df[features] = scaler.fit_transform(df[features])
def create_sequences(data, labels, look_back=24):
X_seq, y_seq = [], []
for i in range(len(data) - look_back):
X_seq.append(data[i : i + look_back])
y_seq.append(labels[i + look_back])
return np.array(X_seq), np.array(y_seq)
X_data, y_data = create_sequences(df[features].values, df['weather_class'].values)
classes = np.unique(y_data)
weights = compute_class_weight('balanced', classes=classes, y=y_data)
class_weight_dict = dict(zip(classes, weights))Class Imbalance Warning: Without class_weight, the model predicts "Sunny" 78% of the time — high accuracy, but useless for rare classes like Snow/Thunderstorm.
Model Architecture
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
model = Sequential([
LSTM(128, return_sequences=True, input_shape=(24, 5), recurrent_dropout=0.1),
Dropout(0.3),
BatchNormalization(),
LSTM(64, return_sequences=False, recurrent_dropout=0.1),
Dropout(0.2),
BatchNormalization(),
Dense(32, activation='relu'),
Dropout(0.1),
Dense(8, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=100, batch_size=64,
callbacks=[EarlyStopping(patience=12, restore_best_weights=True),
ReduceLROnPlateau(factor=0.5, patience=5)],
class_weight=class_weight_dict
)Results
Key Lessons Learned
- 24h look-back beats 6h and 12h — captures full diurnal temperature cycles.
- BatchNorm between LSTM layers is critical — without it, val_accuracy oscillated ±5%.
- recurrent_dropout applies same mask across time steps — correct for temporal learning.
- class_weight is non-negotiable for imbalanced weather data.
Full code on GitHub: github.com/06Neel