Understanding LSTM Networks for Weather Forecasting

Back to Blog

Why LSTMs for Weather Prediction?

Traditional weather forecasting uses physics-based Numerical Weather Prediction (NWP) models simulating the atmosphere with differential equations. They are computationally expensive and produce systematic errors at local scales.

Long Short-Term Memory (LSTM) networks learn complex temporal dependencies directly from data. For a 24-hour weather sequence, LSTM captures both short-term hourly patterns and longer seasonal trends simultaneously.

Related Research: This work connects to our IEEE ICRITO 2025 paper on NWP bias correction. View full citation

How LSTM Gates Work

An LSTM cell solves the vanishing gradient problem of vanilla RNNs using three gates:

Forget Gate: Outputs 0–1 per cell state value — 0 = discard, 1 = keep.
Input Gate: Sigmoid decides what to update; tanh creates new candidate values.
Output Gate: Filters cell state → new hidden state passed to next time step.

Data Preprocessing

Python

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.utils.class_weight import compute_class_weight

df = pd.read_csv('weather_data.csv', parse_dates=['datetime'])
df.set_index('datetime', inplace=True)
df.dropna(inplace=True)

features = ['temperature', 'humidity', 'wind_speed', 'pressure', 'dew_point']
scaler = MinMaxScaler()
df[features] = scaler.fit_transform(df[features])

def create_sequences(data, labels, look_back=24):
    X_seq, y_seq = [], []
    for i in range(len(data) - look_back):
        X_seq.append(data[i : i + look_back])
        y_seq.append(labels[i + look_back])
    return np.array(X_seq), np.array(y_seq)

X_data, y_data = create_sequences(df[features].values, df['weather_class'].values)
classes = np.unique(y_data)
weights = compute_class_weight('balanced', classes=classes, y=y_data)
class_weight_dict = dict(zip(classes, weights))

Class Imbalance Warning: Without class_weight, the model predicts "Sunny" 78% of the time — high accuracy, but useless for rare classes like Snow/Thunderstorm.

Model Architecture

Python

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

model = Sequential([
    LSTM(128, return_sequences=True, input_shape=(24, 5), recurrent_dropout=0.1),
    Dropout(0.3),
    BatchNormalization(),
    LSTM(64, return_sequences=False, recurrent_dropout=0.1),
    Dropout(0.2),
    BatchNormalization(),
    Dense(32, activation='relu'),
    Dropout(0.1),
    Dense(8, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100, batch_size=64,
    callbacks=[EarlyStopping(patience=12, restore_best_weights=True),
               ReduceLROnPlateau(factor=0.5, patience=5)],
    class_weight=class_weight_dict
)

Results

95.57%Classification Accuracy

94.8%Weighted Precision

95.1%Weighted Recall

+23%vs Baseline Model

Key Lessons Learned

24h look-back beats 6h and 12h — captures full diurnal temperature cycles.
BatchNorm between LSTM layers is critical — without it, val_accuracy oscillated ±5%.
recurrent_dropout applies same mask across time steps — correct for temporal learning.
class_weight is non-negotiable for imbalanced weather data.

Full code on GitHub: github.com/06Neel

LSTMTensorFlowKeras Weather ForecastingTime-SeriesPython

All Articles Next: NWP Bias Correction