Table 3 Dataset transformation: conceptual view of selected features before vs. after preprocessing.

Feature	Before Preprocessing (Raw Dataset)	After Preprocessing (Processed Dataset)	Transformation Applied
Packet Size	Missing values (NaN) in some records	Missing values imputed with mean (e.g., 1200.0)	Imputation using SimpleImputer (Mean)
Flow Duration	Large heterogeneous values (e.g., 34,000 \({\upmu }\)s, 150,000 \({\upmu }\)s)	Standardized values (z-score normalization, e.g., 0.15, −1.20)	Standardization with StandardScaler
Protocol	Text categories: {TCP, UDP, ICMP}	Encoded as integers: {0, 1, 2}	Encoding with LabelEncoder
Device Class (Target Variable)	Semantic labels: {Smart Speaker, Smart Camera, Smart TV,...}	Encoded as integers: {0 = Smart Speaker, 1 = Smart Camera, 2 = Smart TV,...}	Encoding with LabelEncoder
Other Numeric Features	Raw values with varying scales (e.g., Bytes Sent, Packets/sec)	Standardized (mean = 0, std = 1)	Standardization with z-score
Other Categorical Features	Non-numeric labels (e.g., “Established”)	Converted to numeric codes (e.g., 0 = No, 1 = Yes)	Encoding with LabelEncoder

Quick links

Search