Table 3 Dataset transformation: conceptual view of selected features before vs. after preprocessing.
Feature | Before Preprocessing (Raw Dataset) | After Preprocessing (Processed Dataset) | Transformation Applied |
|---|---|---|---|
Packet Size | Missing values (NaN) in some records | Missing values imputed with mean (e.g., 1200.0) | Imputation using SimpleImputer (Mean) |
Flow Duration | Large heterogeneous values (e.g., 34,000 \({\upmu }\)s, 150,000 \({\upmu }\)s) | Standardized values (z-score normalization, e.g., 0.15, −1.20) | Standardization with StandardScaler |
Protocol | Text categories: {TCP, UDP, ICMP} | Encoded as integers: {0, 1, 2} | Encoding with LabelEncoder |
Device Class (Target Variable) | Semantic labels: {Smart Speaker, Smart Camera, Smart TV,...} | Encoded as integers: {0 = Smart Speaker, 1 = Smart Camera, 2 = Smart TV,...} | Encoding with LabelEncoder |
Other Numeric Features | Raw values with varying scales (e.g., Bytes Sent, Packets/sec) | Standardized (mean = 0, std = 1) | Standardization with z-score |
Other Categorical Features | Non-numeric labels (e.g., “Established”) | Converted to numeric codes (e.g., 0 = No, 1 = Yes) | Encoding with LabelEncoder |