Project Overview
Multi-class weather classification from historical meteorological time-series data using a stacked LSTM network. The model predicts one of 8 weather categories (Sunny, Cloudy, Rainy, Snowy, Thunderstorm, Foggy, Windy, Hail) from 24-hour input windows.
Dataset & Features
Hourly meteorological readings from weather stations over 5 years. Five input features fed into 24-hour sequences:
- Temperature (°C) — scaled 0–1 via MinMaxScaler
- Humidity (%) — normalized
- Wind Speed (km/h) — log-transformed to reduce skew
- Pressure (hPa) — standardized
- Dew Point (°C) — computed from temp + humidity
Class Imbalance: Sunny (34%) dominated the dataset. Without class_weight, the model achieved 78% accuracy by predicting Sunny — but failed on rare Snow/Thunderstorm classes. class_weight fixed this.
Model Architecture
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
classes = np.unique(y_train)
cw = compute_class_weight('balanced', classes=classes, y=y_train)
class_weight_dict = dict(zip(classes, cw))
model = Sequential([
LSTM(128, return_sequences=True, input_shape=(24, 5), recurrent_dropout=0.1),
Dropout(0.3), BatchNormalization(),
LSTM(64, return_sequences=False, recurrent_dropout=0.1),
Dropout(0.2), BatchNormalization(),
Dense(32, activation='relu'),
Dropout(0.1),
Dense(8, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train,
validation_data=(X_val, y_val),
epochs=100, batch_size=64,
callbacks=[EarlyStopping(patience=12, restore_best_weights=True),
ReduceLROnPlateau(factor=0.5, patience=5)],
class_weight=class_weight_dict)Results by Class
Overall 95.57% accuracy. The toughest classes — Thunderstorm and Hail — had lower recall (88%) due to fewer training samples: