Table 2 Care Assist-GPT model architecture.
Layer | Type | Parameters | Output shape | Description |
|---|---|---|---|---|
Input | Data ingestion | – | (224, 224, 3) | Multimodal input layer for X-ray, vitals, text |
Conv1 | Convolutional | 64 filters | (224, 224, 64) | Feature extraction using 3 × 3 kernel |
Dropout1 | Dropout | p = 0.3 | (224, 224, 64) | Prevents overfitting by randomly dropping neurons |
Pool1 | Max pooling | 2 × 2 filter | (112, 112, 64) | Down-samples features by max pooling |
Conv2 | Convolutional | 128 filters | (112, 112, 128) | Deeper feature extraction with 3 × 3 kernel |
Dropout2 | Dropout | p = 0.3 | (112, 112, 128) | Prevents overfitting at the second level |
Pool2 | Max pooling | 2 × 2 filter | (56, 56, 128) | Further down-sampling via max pooling |
FC1 | Fully connected | 1024 units | (1024) | Dense layer for high-level feature aggregation |
Output | Fully connected | 3 units | (3) | Final layer providing diagnosis, risk, and recommendations |