Table 6 The network architectures used for the feature extraction modules \(\phi _1\), \(\phi _2\). k denotes kernel size, f the number of filters in the convolutional layers and M the final embedding dimension. \(\mathbb {I}\) denotes the unit interval [0, 1].
Feature extractor | \(\phi _1(\mathbf {x})\) | \(\phi _2(\mathbf {x})\) |
|---|---|---|
Input | \(\mathbf {x}_{1k} \in \mathbb {R}^{3\times 500}\) | \(\mathbf {x}_{2k} \in \mathbb {I}^{500}\) |
3-DOF acceleration | HT, FT hists | |
Layer 1 | Conv1D \(k=8, f=32\) | Dense \(500 \rightarrow 100\) |
LReLU (\(\alpha = 0.2\)) | LReLU (\(\alpha = 0.2\)) | |
MaxPool \(k=2\) | Dropout \(p=0.1\) | |
Layer 2 | Conv1D \(k=8, f=32\) | Dense \(100 \rightarrow 50\) |
LReLU (\(\alpha = 0.2\)) | LReLU (\(\alpha =0.2\)) | |
MaxPool \(k=2\) | Dropout \(p=0.1\) | |
Layer 3 | Conv1D \(k=16, f=16\) | Dense \(50 \rightarrow M\) |
LReLU (\(\alpha = 0.2\)) | ||
MaxPool \(k=2\) | ||
Layer 4 | Conv1D \(k=16, f=16\) | |
LReLU (\(\alpha = 0.2\)) | ||
MaxPool \(k=2\) | ||
Layer 5 | Flatten | |
Dense \(320 \rightarrow M\) | ||
Output | \(\mathbf {h}_{1k} \in \mathcal {R}^M\) | \(\mathbf {h}_{2k} \in \mathcal {R}^M\) |