Secure facial biometric authentication in smart cities using multimodal methodology

Sureshkumar, Aanjankumar; Sathyamoorthy, Malathy; Dhanaraj, Rajesh Kumar; Aanjanadevi, S.; Palanisamy, V.

doi:10.1038/s41598-025-29048-5

Download PDF

Article
Open access
Published: 29 December 2025

Secure facial biometric authentication in smart cities using multimodal methodology

Aanjankumar Sureshkumar¹,
Malathy Sathyamoorthy²,
Rajesh Kumar Dhanaraj³,
S. Aanjanadevi⁴ &
…
V. Palanisamy⁴

Scientific Reports volume 15, Article number: 44839 (2025) Cite this article

899 Accesses
Metrics details

Subjects

Abstract

In recent times, in modern smart city environments, securing and maintaining facial biometric security is crucial for preventing unauthorized access to citizen data and safeguarding it from spoofing. This research proposes a multimodal deep learning model with a cryptographic framework for securing facial biometric authentication. The multimodal system utilizes a convolutional neural network (CNN), the Residual Network (ResNet-50), and ElGamal cryptography to extract features from the face and secure the user’s facial information against spoofing attacks. The face biometric mappings are obtained from the first layer using a CNN with several convolutional layers, which retain spatial features like local texture, essential for the facial mapping function. After the first layer, the results are passed to a ResNet-50, which effectively identifies high-level semantic patterns using skip connections to predict accurate faces under varying pose and lighting conditions. Finally, the predicted face marking is secured using a cryptosystem, specifically the ElGamal cryptography technique, to ensure security and privacy when transmitting facial data within smart city networks. The proposed work utilized the CelebA Faces dataset to evaluate the model’s efficiency in secure data transmission within smart cities and to enhance security features that minimize authorization attacks. The facial mapping prediction result for the proposed model is 97.1%, with a low mean score loss of 0.04, indicating that the combined multimodal system performs well in a smart city environment. This performance shows that the proposed fused multimodal approach, which combines CNN low-level feature preservation, ResNet-50 high-level feature extraction, and ElGamal encryption, significantly outperforms the traditional model, such as CNN, with a higher accuracy percentage by 1.2%. ResNet-50 by 2.2%, Brakerski–Gentry–Vaikuntanathan algorithm by 1.1%, showing that the proposed deep learning model effectively handles spoofing and secure data transmission, making it a suitable solution for preventing unauthorized access in futuristic smart cities environments.

Smart city energy efficient data privacy preservation protocol based on biometrics and fuzzy commitment scheme

Article Open access 13 July 2024

Study of image sensors for enhanced face recognition at a distance in the Smart City context

Article Open access 07 September 2023

RF sensing enabled tracking of human facial expressions using machine learning algorithms

Article Open access 13 November 2024

Introduction

Smart cities have emerged as a concept that represents innovation, efficiency, and connectedness amid the changing face of urbanization. Smart cities utilise modern technology to enhance the delivery of essential services, optimise available resources, and improve the living standards of their citizens. Smart city environments rely on high-end digital systems to ensure security and privacy, making facial biometric authentication a more convenient method to verify identities based on citizens’ physiological traits. However, the growth of digital contacts and the expansion of urban populations have necessitated digitised integration in smart city infrastructure. The security measures are becoming increasingly important as smart cities continue to develop. Given the increase in digital contacts and the volume of sensitive data shared in urban areas, it is crucial to ensure that authentication systems are both secure and reliable. Biometric identification technologies offer a viable solution to this challenge, as they provide a secure and practical way of verifying people’s identities based on unique biological or behavioural attributes, such as voice recognition, iris scanning, fingerprinting, or facial patterns. Biometric identification technologies exist, which offer both safe and convenient means for person authentication through unique physiological or behavioural attributes, such as voiceprints, iris patterns, fingerprints, and face shapes. However, traditional biometric authentication methods are accompanied by several challenges, includes privacy concerns, inaccuracies, and low resistance to spoofing attacks. Smart cities face complex and constantly evolving security issues that conventional security measures struggle to address. This has led to an increase in demand for innovative ideas that will safeguard sensitive data, critical infrastructure, and public safety. Therefore, artificial intelligence-driven biometric facial cryptosystems provide an excellent means to enhance smart city security. The spoofing attacks processed by the attacker include faking biometric face samples, such as video and 3D mask detection. Conditions like variations in light and facial expressions increase the challenges of providing security in smart city environments.

Figure 1 illustrates the attacker spoofing process, a significant limitation that involves holding a photo of the original citizen in front of the camera. The facial verification system cannot distinguish between a live face and a spoofed image, leading to incorrect authentication of the attacker. This raises serious concerns about the vulnerability of traditional facial biometric recognition systems to spoofing and replay attacks.

Artificial intelligence-based facial recognition is increasingly popular in the current technological era, thanks to advanced facial sensors that can map each facial feature with high precision. Encryption and decryption-based facial transactions serve as a robust and flexible security function in smart cities, protecting user data from tampering. The proposed model utilizes ResNet-50, where each facial mapping is stored in a separate residual block. The CNN analyses the structure of the input face to provide advanced biometric authentication in smart cities, processing facial data with high accuracy and achieving greater success through a deep learning framework.

The benefit of this multimodal system is that it utilizes a biometric cryptosystem to accurately recognise people’s faces by examining various facial features, including the eyes, nose, lips, and forehead. Initially, the biometric data is forwarded to a CNN that accurately maps the facial markings of the particular person, thereby enhancing reliability. This extraction prevents changes to face markers, such as facial spoofing attacks. The final biomarkers are forwarded to ResNet-50, which extracts the facial features and preprocesses them into biometric data. Then, the ElGamal cryptosystem generates a key to store the person’s facial biomarkers. During the transmission, the biomarkers undergo another extraction. Notify the smart city environment if any unauthorised entry occurs. This layer enhances the privacy and security of personal information by enabling precise authentication and expression pose analysis in low-light conditions without compromising the user’s integrity. The proposed multimodal system achieves high accuracy in recognising and transmitting people in smart city environments.

Combining CNN and ResNet-50 for facial biometric security enhances the efficiency of safeguarding person biomarkers in the future, enabling the accurate identification of individuals through facial mapping with markers in various poses. Initially, analyse the one-person pose, which is then transferred into a convolutional neural network with multiple layers to classify biomarkers for individuals numbered from 1 to n, facilitating feature extraction. This process enhances the efficiency of the proposed methodology in data manipulation for facial biomarkers, ensuring the integrity of each face in the smart city person identification server. After initial analyze, the ResNet has the advantage of using skip connections to effectively handle the weight with different faces by effectively identifying person pose and marking make variations by fixing weight for different person images to increase the confidentiality of the proposal, an additional security layer is added with ElGamal key generation to ensure security against changes to or manipulations of a person’s facial record. Moreover, the final security system controls unauthorized access during verification and transactions. Bringing together multimodal deep learning and biometric cryptosystems helps keep data safe and private in smart city settings by protecting an individual’s information from attacks like spoofing¹.

The proposed model is a futuristic one suitable for urban environments with digital transactions and verification. The proposed model is essential due to the increasing number of security attacks, such as spoofing, that are prevalent in this AI era. The proposed multimodal system represents a significant advancement by ensuring the integrity of personal identity and privacy during digital interactions within smart city environments.

Motivation of the paper

The discussion below outlines the motivation behind this proposed work.

The initial goal is to develop an advanced fused multimodal deep learning technique specifically designed for biometric identification systems used in smart cities.
The research aim is to prevent user facial marking spoofing attacks by using different pose facial expressions under different lighting conditions.
An additional advantage of the proposed work is the incorporation of a cryptosystem to ensure privacy during data storage and retrieval within the smart city environment.
The smart city represents an evolving technological environment that requires a more secure model than the traditional deep learning approach, which is time-consuming in producing predictions.
It must be efficient and expandable, capable of handling the substantial volumes of biometric data and authentication requests associated with smart city environments. It should keep responsiveness and speed at a high level while minimising computational overhead.

The motivation behind the proposed multimodal model is its enhanced resistance to attacks such as spoofing and adversarial examples, as well as its integration of additional security measures into smart city environments. Using biometric authentication in a fused Multimodal System with CNN and ResNet-50, along with additional factors such as facial mappings and lightning conditions, can enhance security. Implement privacy-preserving approaches to safeguard individuals’ biometric data, such as the ElGamal encryption algorithm. Encrypt data communications between devices and servers using Ed25519 signatures for secure communication. The improved system framework incorporates CNN and ResNet models for feature extraction, biometric cryptography for key generation, and additional security protocols to enhance the safety of AI-driven biometric face encryption in smart cities.

Organization of the paper

The research paper is organised as follows: Sect. 2 gives an overview of the relevant literature on deep learning facial prediction methods. Section 3 highlights methods and materials used. Section 4 enlightens the proposed algorithm. Section 5 evaluates and compares the proposed methodology to existing techniques. Finally, Sect. 6 concludes the paper by outlining its future scope.

Related works

Smart cities use millions of sophisticated gadgets to create a more connected and efficient urban environment. These devices include smart industries, smart automobiles, and other communication systems. However, the security of these devices can be challenging since they often include sensors that are incompatible with standard security protocols. Additionally, some manufacturers do not prioritise software upgrades to address known security issues. This article proposes a standard style that integrates both soft computation and deep learning approaches, ensuring the proper and secure performance of sensors in smart cities. Researchers have developed numerous machine learning algorithms to detect irregularities and defend smart devices from cyberattacks. However, there have been limited efforts to create continuous neural networks specifically for this purpose. The frequency of assaults against smart devices has increased in recent years, coinciding with their growing use. To mitigate the security risks associated with such attacks, reliable detection measures are necessary.

Georgiana Crihan et al. (2024)² have proposed a facial identification model utilising the homomorphic encryption technique to enhance the security of data access for various users. This model achieves 96.80% accuracy and retrieves data in seconds, which is faster than our proposed model. The hardware implementation of this model is time-consuming. If the power generation unit fails, the total system will restart from the initial process to overcome the hardware issues. Our suggested model uses a hybrid deep learning model that combines ResNet-50 and CNN to record and analyse the facial security function. In AI-powered smart cities, this model is 97% accurate.

Boddeti et al. (2018)³ implemented the Efficient Homomorphic Face Matching function using the NTL C + + library, a process that requires significant computational time. In their proposed model, they employ a hybrid machine learning model that can preprocess the facial contents, requiring a minimal amount of computation. Furthermore, the accuracy of the computational resources used by the ML model is reduced. The proposed model matches facial poses more accurately than the NTL C library, which mismatches large facial datasets with face-matching poses. The CNN facial pose structure analyser process accurately identifies facial poses with an accuracy of 97% compared to the hormonal encryption function, which has an accuracy of 96.74%.

Yang et al. (2019)⁴ proposed a random projection-based transformation with a weight factor to process two models of biometric data analysis. They use both fingerprint and face to process feature proportion with an accuracy of 90%, but this method decreases system performance. In our proposed model, all face data are stored in residual blocks, and security encryption is performed using ElGamal encryption, which improves system performance and achieves an accuracy of 97%.

Yang et al. (2022)⁵ suggested using the FaceNet neural network to find the encrypted image and recover the original image after face classification, thereby enhancing user privacy using a traditional CNN. This takes 1.712s to process. The proposed model, on the other hand, employs a hybrid deep learning model that combines ResNet-50 and CNN to capture and analyse the facial security function, achieving 97% accuracy in AI-powered smart cities and processing data in 0.62ms.

In their 2021 study⁶, Jaswal et al. suggested combining palm prints and finger knuckles to enhance the security of biometric authentication, thereby reducing the risk of gathering sensitive information during evidence collection. They use the LOP encoding CKKS algorithm, which has a recognition accuracy of 96%. However, the proposed model employs a hybrid deep learning model that combines ResNet-50 and CNN to mitigate the gradient vanishing issue when examining facial mapping. After decrypting the image without pixel loss, it achieves an accuracy of 97% in AI-powered smart cities and completes the entire function in a processing time of 0.62 ms.

Win et al. (2023)⁷ conducted research using linear regression to map various facial expressions onto a latent space, thereby calculating different conditions. An accuracy of 67.5% falls short of meeting the condition for sending a single person’s notification to the specific security system. The research mentioned above employs a standalone machine learning technique that requires more resources for training, resulting in a high computational cost during predictions. The above research is not suitable for evolving smart city environments.

Jindal et al. (2020)⁸ proposed a dimensional feature vector with real and imaginary parts, using a one-person face as ciphertext, with regeneration of the original work taking 2.83 ms. This biometric template differentiates between the regeneration of original data from two vectors: one representing a real person’s face and the other representing a spoof image of that person. The above research utilises a predefined vector template that connects to a centralised server. Image regeneration takes longer during training, and predicting spoofed images requires more time than expected.

Alamgir Sardar et al. (2020)⁹ suggested a face hashing technique with the RSA security algorithm. The work defines the feature extraction process using a face protection template to generate the correct facial output. The protection template processes the hash for a specific time with a prediction accuracy of 86.27. The above research uses the hash code represented as 0 and 1. Some results are dropped due to the extraction of face outcomes for different poses, leading to high loss.

Gavisiddappa et al. (2020)¹⁰ employed the Chebyshev distance measure. The method predicts facial mapping with a mathematical function. After calculating the mathematical function, it predicts the result of the face as a hit or a miss. These hit-and-miss values are transferred into a support vector machine learning classifier, achieving a detection accuracy of 97%. However, this method is prone to the vanishing gradient problem, which can lead to overfitting. The presence of miss and hit blocks has reduced image clarity, which in turn increases the false negative rate during perfect face recognition.

Malarvizhi et al. (2020)¹¹ proposed an adaptive fuzzy genetic model that categorises the different facial biomarking techniques. The model first separates the facial images into men’s and women’s categories with a detection accuracy of 96%. The model’s separation led to some data mismatch, as the training required additional computation time for image separation. This soft computing neural network utilises neurons to capture facial dependencies, which can lead to security leaks that are prone to facial spoofing attacks.

Sharma et al. (2025)¹² utilizes a hybrid model that combines multimodal technology to perform intrusion detection mapping by extracting various features from the Internet of Medical Things. The model has a validation loss of 0.044, indicating that the training is efficient in identifying the correct mapping. The demographic condition is distinct for the Internet of Medical Things, which differs from traditional cyberattacks. The facial mapping is an effective model used by the proposed model, achieving high accuracy (97%) and low loss (0.04), which makes the combined multimodal model strong in including various facial biometric security features.

Sharma and Shambharkar (2025)¹³ suggested a dynamic adaptive reinforcement learning to generate different keys for dynamic threats, ensuring generative security for data storage servers within an attribute-based access control domain. The access control domain retrieves the expression, compares it with various encryption techniques from multiple servers, and achieves a detection accuracy rate of 99%. Meanwhile, the proposed model employs a hybrid deep learning model that fuses ResNet-50 and CNN. This model associates facial expressions with a cryptographic function to safeguard the facial authentication system against attacks. This ensures that transactions are processed successfully 97% of the time in AI-powered smart cities.

The study by Sharma et al. (2025)¹⁴ employed different types of security methods to change data representation, such as the extraction method and local binary patterns, as well as an intrusion detection model, which achieved low transaction reversal with high probability using temporal pattern features for a blockchain model, resulting in 98.56% multiclass classification accuracy. However, our proposed model sends the facial extraction for double-standard verification, incorporating security functions to achieve facial mapping. By combining an ElGamal cryptography function with a matching key, the proposed system outperforms the above wavelet model by 0.3%.

The study by Vidya et al. (2019)¹⁵ proposed an entropy-based facial image extraction technique for use on a cloud server. This method is designed explicitly for cloud-oriented service function data accessed from multiple biometric sensors. However, combining and authenticating these data leads to a time-consuming process with an accuracy of 91%. However, the proposed model employs a hybrid deep learning model that combines ResNet-50 and CNN. This model maps data using an AI model, analyses the facial security function, and achieves an accuracy of 97% in AI-powered smart cities.

Jagadiswary et al. (2016)¹⁶ analysed facial biometric function using a fused multimodal model, which has an ample feature space. This model is highly efficient in spoofing attacks, but it requires a significant amount of time for face regeneration. Our proposed model utilizes CNN structure verification to map the correct facial structure during the regeneration process, achieving a facial recognition accuracy of 97% with minimal loss in the retrieved results.

Table 1 highlights the information about the model used in the facial biometric authentication model, along with its key highlights from the literature review.

Table 1 Highlights of traditional model applied for literature review.

Full size table

This research proposes a model for safe smart cities utilising AI-based biometric face cryptography, addressing various technical inadequacies and difficulties identified in the relevant literature review. The literature review identified several issues. Traditional authentication methods may not adequately secure sensitive data and innovative city systems against sophisticated attacks. High false acceptance rates may compromise the reliability and effectiveness of the authentication process, resulting in security gaps and confidentiality breaches. Large-scale smart city distributions require scalable biometric authentication solutions. Due to computational complexity and resource requirements, biometric authentication solutions may be challenging to implement in environments with limited resources. Specific biometric authentication systems may not be flexible enough to accommodate the complexities of innovative city environments. It is necessary to develop a comprehensive and efficient ML model that integrates advanced techniques in artificial intelligence, biometrics, cryptography, and system architecture to overcome these technological shortcomings and barriers. The CNN and ResNet algorithms are used for secure and reliable biometric identification, ensuring privacy, scalability, efficiency, and adaptability to various innovative city environments.

Materials and methods

By combining CNN and ResNet-50 hybrid models, the methods and materials enhance face security authentication functions in smart cities, improving face cryptography and authentication compared to traditional biometric models that utilize hybrid deep learning techniques.

Materials

Dataset

The dataset used in this research is the CelebA dataset. It comprises more than 200,000 facial images of diverse individuals and is freely available online. The 40 annotations of different facial expressions are noted for people with the same facial markings, but with other annotations¹⁷. This model integrates a CNN to extract the facial image structure, ResNet-50 to store the retrieved facial structure in various residual blocks, and incorporates ElGamal encryption cryptography to match the authenticated face without any loss or deviation in the regeneration of the original facial image during data transmission, as suggested. A face recognition model is trained using the tagged identity of each person portrayed in each facial image¹⁸. –¹⁹.

Data pre-processing

Verify the images are consistent and work with the CNN and ResNet-50 architectures before using the author’s dataset. Resize the images to a consistent resolution to make input into neural network models easier. Set the pixel values to a standard range of [0, 1] to maintain consistency among images. The dataset used in this research is Celeb, where face images are extracted and fine-tuned with the initial layer of a CNN, which features extraction through different convolutional layers, each with a distinct block. The block analyses the facial mapping as a feature extraction method for facial recognition, and then this face is transmitted to the next layer for ResNet training. In this layer, faces are assigned different weights for various individuals, and skip connections facilitate model evaluation and training²⁰.

The evaluation of the proposed model is analysed using biometric facial security training, focusing on accuracy, precision, F1 scores, and recall on the Celeb dataset. An additional activation layer enhances the model by using various samples, including printed face photographs, to identify spoofed facial images. This feature makes the proposed model an efficient tool for identifying and preventing facial spoofing attacks. The model processes 40 binary attributes, using a resolution of 224 × 224 × 3 for colour image pixel analysis within each layer to make final predictions in the smart city environment. To protect the privacy of individuals who haven’t consented to being part of research or development projects, blur or anonymise photographs. Respect the privacy of individuals featured in the CelebA free-labelled face dataset and ensure that biometric data usage adheres to established rules and ethical standards. By utilising the CNN and ResNet models along with the CelebA free-label face dataset, experts and innovators can develop precise AI-driven facial recognition systems tailored to protect urban areas effectively²¹.

Methods

Securing smart cities involves integrating technologies like AI-based biometric facial multimodal systems with CNN and ResNet-50. Additionally, ElGamal cryptography necessitates the implementation of various solutions to address diverse security and privacy concerns. The following categories of techniques are applied here:

Detection and recognition of face

The detection of facial mapping, processed by a multimodal approach, captures live images or video through a device. After capturing the image or mask, the facial features are extracted using a CNN + ResNet-50 model. Then, the features are modified into a binary form of 0 and 1 for matching. After matching, the spoofing detection function is used to analyse the images and 3D mask of the particular citizen, including changes in facial expression and pose under varying light conditions. Then, the proper citizen facial data is stored and further processed for encryption. The encryption is processed using ElGamal encryption, and the signature compares the extracted features of the face. Authentication is then processed on both the matched face and the spoofed face for the transaction. Finally, the real-time monitoring model enables adaptive learning to ensure security and privacy within a smart city environment. The process is elaborated in Fig. 2^22,23,24.

First layer -convolutional neural network (CNN)

CNN’s convolutional layer accepts a sequence of facial images as input, achieving this by isolating the bits 0 at the back end and 1 at the front end of the facial feature vector as low-level spatial information. Here, the CNN technique employed integrates cutting-edge architectural designs, optimisation approaches, and training methodologies in a biometric face authentication system.

a) Connected layer: Fully connected convolutional layer expressed in expressed in Eq. 1

$$\:Convo\:\left(l\right)=\sigma\:(\:W\left(l\right)\text{*}Prev\:\left(l-1\right)+b(l)$$

(1)

Where convo (l) is the output tensor of l_th layer, W(l) weight matrix of the l_th layer, Prev (l-1) is input of the previous layer, b(l) is the bias vector of the l_th layer.

b) Pooling layer: Pooling layer max pooling is represented in Eq. 2

$$\:Out\left(l\right)=pooling\left(Convo\left(l\right)\right)$$

(2)

Where Out(l) is the output of the pooling layer.

c) Dense layer: Dense layer output of fully connected layer expressed in Eq. 3

$$\:Convo\left(l\right)=\sigma\:\left(W\left(l\right).Out\left(l-1\right)+b\left(l\right)\right)$$

(3)

. represent the matrix multiplication.

a.
d) Activation function: Activation function with common activation function tanh elementwise operation applied to each element is expressed in Eq. 4.

$$\:\sigma\:\left(Convo\right)=\text{tanh}\left(Convo\right)$$

(4)

Convo defines the input tensor.

Repeat the steps until the number of repetitions reaches the optimal solution²⁵. –²⁶ This approach describes how to combine facial recognition with authentication techniques using a CNN architecture. The data processing in each process is shown in Fig. 3, which illustrates the Data Flow diagram of the proposed Model.

The working procedure of the proposed multimodal system is expressed below.

Step 1

Raw facial images of the CelebA dataset are pre-processed and sent towards the proposed multimodal approach.

CNN path extracts local facial features, such as edges and facial texture, through convolutional layers. Equations 1, 2, and 3 represent the pooling layer function, and Eq. 4 represents the activation function.

The ResNet-50 path captures high-level deep features, such as poses and facial expressions, with the help of a residual network with skip connection functions, as expressed in Eqs. 5 to 8. This layer overcomes the vanishing gradients problem through skip connection, preventing overfitting of results²⁷.

Step 2

The fusion layer processes the features from CNN and ResNet-50 through concatenation, as expressed in Eq. 9, into a VE_fusion —a unified vector feature that enhances the spoofing detection function.

Step 3

The vector data is processed to the next layer of ElGamal cryptography, where a signature is added using Eq. 10 to identify the noise and prevent security breaches during data transmission.

Step 4

The transaction feature from the cryptography layer processes the facial mapping vector from the databases, as expressed in Eqs. 11 and 12, for authentication purposes, granting access with a threshold function.

Step 5

After access is granted, only properly decrypted citizen face features that match are processed with all functions, with high confidence expressed in Equation 0.13 as the final output.

Fused deep learning model

The fused process of the proposed model is defined to express the roles of CNN and ResNet-50 as separate roles and the advantages of the combination mentioned below:

The first layer of the proposed model is based on low-level feature extraction using a CNN. The input face is processed through the CNN, where spatial features, such as face edges, texture, and colour, are extracted. The process, similar to facial authentication, involves capturing low-level features that are crucial for predicting variations in facial mappings, such as the eyes, mouth, and chin. The initial layer preserves the fine-grained textual facial mapping feature, which accurately predicts skin colour and variation inconsistencies under different lighting conditions in facial authentication, effectively handling these inconsistencies²⁸.

The second layer is processed with ResNet-50, which extracts high-level facial semantic features, such as facial patterns and deep facial mapping structures, as spatial relationships. The skip connection effectively handles the vanishing gradient problem, allowing further model training. The ResNet-50 effectively extracts complex facial patterns, such as pose changes and various facial expressions, to preserve the integrity of facial data²⁹.

Finally, combining the CNN and ResNet-50 makes the proposed model an effective way to secure citizen data. The CNN layer processes low-level extraction and passes the facial feature extraction to ResNet-50 for higher semantic learning, allowing for deep facial contextual mapping to be stored. The fused model effectively prevents spoofing by verifying both facial texture authentication and facial structure identification. This makes it difficult for an attacker to use fake images or moving videos to bypass the proposed hybrid model shown in Fig. 4^30,31.

a) Residue input: The Input of Resnet 50 as residue block is expressed in Eq. 5

$$\:Convo\left(l+1\right)=Residue\:Block\left(Prev\left(l\right)\right)$$

(5)

Where Convo(l + 1) is the input of the tensor of (l + 1)_th layer and Prev(l) is the input of the previous layer.

b) Residue Connection: The shortcut connection (SC) adds the input tensor Prev(l) to the output of the residue block Is expressed in Eq. 6

$$\:SC\left(l+1\right)=Convo\left(l+1\right)+Prev\left(l\right)$$

(6)

c) Normalization: Batch normalization can be represented in Eq. 7

$$\:S{C}_{Norm}\left(l+1\right)=Batc{h}_{Norm}\left(Prev\left(l+1\right)\right)$$

(7)

d) Inner Activation Function: Activation function Relu is applies after normalization in shortcut connection expressed in Eq. 8

$$\:SC\left(l+1\right)=Relu\left(Pre{v}_{Norm}\left(l+1\right)\right)$$

(8)

The proposed methodology has the following steps for further processing into a fusion model of CNN and ResNet 50.

e) Fused Model: Merge the feature vector extracted by CNN and ResNet 50 into a single feature vector in Eq. 9

$$\:VE\left(fusion\right)=\odot\:\left({\text{V}\text{E}}_{CNN},{\text{V}\text{E}}_{\text{R}\text{e}\text{s}\text{N}\text{e}\text{t}}\right)$$

(9)

⊙ represent concatenate function of CNN and ResNet 50.

f) Privacy Preserving Function: ElGamal encryption is expressed in Eq. 10

$$\:V{E}_{enc}Elgamal\:encrypt\left(V{E}_{Fusion},{P}_{k}\right)$$

(10)

Encrypt the fused vector using ElGamal encryption with P_k as public key.

The proposed fused model utilizes secure transmission with the help of the ElGamal Cryptosystem, which processes encryption on large images as a hybrid framework to ensure the efficient and safe transmission of facial biometric images within innovative city environments. After the fused model’s facial output is mapped with a feature vector, it is then converted into a binary format. Facial encryption is processed using the AES cipher, and the key is processed with the ElGamal elliptic curve variant for enhanced security. The cryptographic approach is computationally practical for secure key exchange, using Ed25519 signatures to sign the payload’s facial metadata and ensure integrity by preventing man-in-the-middle replay attacks³². This cryptographic approach ensures that the facial pattern extracted by CNN and ResNet-50 remains confidential. Facial mapping prevents unauthorized access and provides resilience against spoofing, supported by real-time detection capabilities. The combination of encryption and a digital signature ensures a secure biometric system, providing a futuristic solution for deploying next-generation smart city environments³³.

g) Authorization: Authentication function performed using encrypted feature vector and required parameter ϴ expressed in Eq. 11

$$\:Auth\:Result=Authenticate\:(V{E}_{enc},\theta\:)$$

(11)

h) Facial Marking analysis: Verification process can perform comparing encrypted feature vector with database of authorized facial image feature expressed in Eq. 12

$$\:Access\:granted=Verify\:facial\:identity(V{E}_{enc},\:Facial\:Database)$$

(12)

i) Result: Overall fusion process for facial image authentication is expressed in Eq. 13

$$\:Fusion\:Access\:granted=Verify\:(Authenticate\left(Elgamal\:Encrypt\left(\odot\:\left(\text{C}\text{N}\text{N}\left(\text{X}\right),\text{R}\text{e}\text{s}\text{N}\text{e}\text{t}\left(\text{x}\right)\right)\right)\right).{\text{P}}_{\text{k}}$$

(13)

j) Loss Function: Ensure the safe transmission and storage of encrypted data within the smart city network using hybrid model of CNN and ResNet-50 stride, regularization where $\:\gamma\:\:regularization\:contant$ and Training module to minimize the loss are to be expressed in Eqs. 14,15,16^34,35.

$$\:Stride\:\left(X\right)=Convo\:(X,Optimizer,kenal\:Size,Stride\:2)$$

(14)

$$\:Total\:Loss={L}_{Fusion}+{L}_{Authenticate}+\gamma\:\text{*}L2$$

(15)

$$\:\text{C}\text{o}\text{s}\text{t}\left(\text{F}\text{u}\text{s}\text{i}\text{o}\text{n}\right)=\text{C}\text{o}\text{s}\text{t}\:\left(\text{C}\text{N}\text{N}\right(\text{X}),\text{R}\text{e}\text{s}\text{N}\text{e}\text{t}(\text{x}\left)\right)$$

(16)

To minimize the loss using authentication and gradient descent to prevent image vanishing from the put is expressed in Eq. 17.

$$\:\text{m}\text{i}\text{n}({L}_{rFusion}+{L}_{Authenticate})$$

(17)

The algorithm for Facial Biometric Security using CNN and ResNet-50 is discussed below³⁶.

Cryptographic work flow

The cryptographic workflow aims to protect biometric facial templates by storing verification latency in a testing function. This involves a hybrid cryptographic process that utilizes a dimensional fused vector after integrating fused PCA. Next, after the PCA data is encrypted with AES 256, which modifies the standard AES to an asymmetric approach using the ElGamal system, it is processed over an Elliptic curve during the initial enrolment phase. This implementation secures the confidentiality and authentication of stored facial data and enhances the efficiency of the cryptographic function for feature vectors of arbitrary sizes by utilising Ed25519 signatures to sign the payload’s facial metadata. The complete fused feature facial vector, consisting of a 2048-dimensional 32 float vector, is considered plain text for AES. Here, a vector under ElGamal is not feasible, but a hybrid AES + EC-ElGamal function is used to secure significant plaintext inputs in many encryption/decryption systems with a key wrapping function.

In the proposed model, the ElGamal system is used to encrypt the wrap, which is a component with a symmetric AES key. The facial feature vector data is encrypted using AES, and data reproducibility is achieved through the use of quantisation scale and PCA dimension in the verification phase, as shown in Fig. 5.

The importance of the fused model is expressed as a convolutional neural network (CNN), and ResNet-50 (Residual Network) is proposed to process facial biometric features for citizen authentication in smart city environments³⁷. CNN’s ability to extract low-level facial features from image-capturing devices, including fine-grained facial markings essential for feature extraction, is considered. However, CNN has the disadvantage of vanishing gradient issues for deeper facial biometric capturing functions. To overcome the disadvantage of CNN, ResNet-50 is used with skip connections, enabling deeper training to mitigate the vanishing gradient problem and preserve high-level facial features. The skip connection enhances the efficiency of the learning pattern in the proposed work, facilitating the capture of complex facial expression features and the detection of spoofing images with 3D masks[38,39] The fusion function enhances architecture efficiency through feature localisation, and deep residual facial classes create a more powerful multimodal approach to authenticate facial biometrics within smart cities. The proposed model utilises an additional encryption function based on ElGamal encryption to enhance the data transmission security in smart cities, thereby increasing detection accuracy and reducing loss in training and testing evaluations. Thus, the proposed multimodal approach enables secure and fast facial biometric authentication in a smart city environment.

Analysis of limitation of algorithms

The proposed model limitation analysis can be broken into two parts: first, the fusion of a deep learning model (CNN and ResNet-50), and second, the ElGamal encryption⁴⁰.

a) Fused Multimodal:

CNN time complexity can be formulated from the number of layers-n, convolutional kernel size-s, and input data size-d, with channels-c -, expressed in Eq. 18.

$$\:O\left(n\text{*}c\text{*}s\text{*}{d}^{2}\right)$$

(18)

Resnet-50 model complexity analysis can be done with the help of the number of layers-l, feature map-f, kernel size-s and channel -c, where the Resnet-50 and computational overhead handled the vanishing gradient problem, which was handled by the skip connection expressed in Eq 0.19⁴¹

$$\:O\left(l\text{*}{f}^{2}\text{*}{s}^{2}\text{*}c\right)$$

(19)

With this fused DL model, there is an improvement in the accuracy of detecting facial marking without an increase in computational limitations compared with traditional standalone CNN and Resnet-50.

b) ElGamal encryption:

Modular exponentiation is the process of calculating the limitation analysis of the ElGamal encryption with signature function, where n is the number of bits used for a particular key, as expressed in Eq. 20^40,43.

$$\:O\left({n}^{2}\right)$$

(20)

From the above analysis, it is observed that the two different deep learning models have unique features in handling large amounts of data during the transmission and storage of facial markings, effectively managing computational limitations while preserving privacy in the facial security system. Table 2 below delves into the computational limitations of some traditional models.

Table 2 Computational limitation of traditional model comparison.

Full size table

Computational and scalability limitation

The limitational and practical considerations of distributed environments are mentioned below.

a) elgamal’s computational overhead

The ElGamal system requires elliptic scalar multiplication, which makes it slower than the AES symmetric encryption function. However, the proposed model limits this overhead to a single public encryption for the entire facial template, storing per-template asymmetric transformation costs that are stable and small compared to component-wise cryptographic functions. The model utilizes a CPU process-based EC-ElGamal wrap that can operate at 1 to 5ms for a single hardware-dependent operation. This capability may lead to high-throughput real-time deployment by using Ed25519 signatures to sign the payload’s facial metadata, with the cost offloaded to a smart city-based trusted server.

b) scalability and key management

Securing millions of encrypted facial templates is feasible, but key management and latency will be a centralised problem. To address these centralized issues per region, keys are stored in a hardware security module, where decrypting each facial template in the cache memory is matched to avoid repeated AES operations. The indexing function will reduce template searching cost, which will increase the trade-off between searching and data privacy. The next step is to manage multiple verifier systems that can perform final key management by processing decryption, where each key verifier unwraps an AES key in their local system. Next, use threshold encryption/decryption to split the private key among each user, utilizing Ed25519 signatures to sign the payload’s facial metadata. This function enabled efficient key management in a distributed system, with the server authenticating a proper channel to a trusted secure environment.

Experimentation and results

The proposed system is entirely trained and cross-tested on the CelebA face datasets, with a focus on facial detection and privacy preservation outcomes. This dataset comprises 200,000 facial images, which were used for both the training and testing phases. The wide variety of facial image sets is used to maintain a balance between two distinct user image classes and the manipulated outcome⁴⁴.

Experimental setup

Here, the fused hybrid CNN and ResNet-50 model is used for data acquisition. Gather facial recognition information from security cameras and Internet of Things devices placed across the smart city. Before processing, make sure the gathered data is accurate and consistent by cleaning and standardising it. Use the hybrid model to identify faces in image mappings. Feature extraction involves using the CNN architecture to extract discriminative characteristics from identified faces and then integrating ResNet-50 with these retrieved features to enhance feature representation and categorisation further. Then, ElGamal cryptography with the signature pads additional security for successful data transmission, ensuring the security of citizens inside smart cities. For access control, securely authenticate and decrypt the encrypted biometric data. Adaptive security, Through Continuous learning, utilises feedback mechanisms to update and improve CNN and ResNet-50 models over time^45,46 Combining an enhanced CNN with ResNet-50 and biometric facial encryption is suggested as a way to make smart cities safer.

Selection of dataset attributes

The selection of dataset attributes as presented in Table 3 for facial biometric authentication is smart cities expressed below:

The depth of 256 enables the multimodal system to capture detailed facial markings, including features such as pose and facial expressions, thereby mitigating computational overhead. The dataset is divided into 70% for training and 30% for testing, ensuring proper data for effective learning of the multimodal model while preserving facial samples for unbiased evaluation. The dataset consists of a 200,000-training set used to analyse and correct different facial variations across multiple individuals with varying facial expressions. This is essential for improving spoofing detection and maintaining proper detection accuracy for the validation of results⁴⁷. Stride 2 in the convolution layer enables the processing of facial feature dimensions, practical computation, and the analysis of deeper features with multimodal data, allowing for accurate image prediction without losing spatial feature information. L2 regularization is applied to prevent overfitting by penalising large weights, thereby enhancing the ability to generalise to unseen facial data in a real-world smart city environment. Finally, the dataset is validated to ensure an equal split between fake and original faces, evaluating the model’s ability to distinguish between authenticated citizens and spoofed attacks processed by attackers.

Table 3 Dataset attributes.

Full size table

Generalisability of results

The generalizability of the experimental results depends on both the facial marking dataset and the model’s privacy-preserving function.

1.
The selection of a dataset with clearly defined facial features, along with environments where the proposed model can generalise to surveillance setups in various lighting conditions, is crucial⁴⁸.
2.
The fused model outcome is strong because it combines ElGamal encryption with different face authentication transaction systems in various smart city locations. It also clearly identifies the user in more crowded areas, resulting in excellent outcomes⁴⁹.
3.
To make generalisations about other biometric features, the proposed model employs the fusion of various traits, such as iris and fingerprint authentication⁵⁰.

Ultimately, the fused model performs well in various settings, successfully integrating two distinct models to manage computational complexity. The proposed work validates the well-designed architecture, clarifies the facial marking dataset, and ensures reliability through high-end testing in various smart city environments.

Proposed modal testing and training

The efficiency and reliability of the advanced secure mechanism for an AI-based biometric face authentication system in smart cities are validated through testing and data analysis. During the training phase, 70% of the data is used for training purposes, and 30% is reserved for testing. The model generates training and testing output from 200,000 images per sample, obtained through image segmentation, as shown in the Figure. 6. The model utilized a 224 × 224 colour image for facial mapping, 256 depths for each channel, a stride of 2 for the output of the residual layer with a minimum loss calculation, and the Adam optimiser for L2 regularization. This approach yielded an accurate result from the facial cryptographic function, facilitating successful user transactions. Each model follows the lexicographic order for image segmentation to improve reproduction. All tests are executed on a machine with a 64-bit Windows system, an Intel processor, and 16 GB of RAM.

Performance metrics

The developed deep learning model integrates CNN and ResNet-50 for facial authentication in a smart city setting. The performance of the proposed system is analysed using various evaluation metrics, such as the confusion matrix, accuracy, precision, sensitivity, F1-score, and false positive rate, to achieve superior outcomes compared to other traditional deep learning models currently used in facial recognition security^51,52.

Confusion matrix

It enhances the system’s efficiency by analysing how well the proposed model classifies and retrieves correct user facial mapping, calculating both positive and negative facial data using four available predicted class functions⁴⁵. they are.

True Negative (TN): correctly identified but not matched facial data.

False Positive (FP): correctly identifies the facial data of an unknown user.

False Negative (FN): incorrectly identified as not matching data for a known user.

True Positive (TP): correctly identified as the matched data of the known user, where the confusion matrix for prediction is mentioned in Table 4, and Fig. 7 defines the confusion matrix of the predicted class.

Table 4 Confusion matrix.

Full size table

The cosine similarity matrix of the facial vector shown in Fig. 8 defines proper separation between the original and spoofed facial image attempts expressed in the red quadrant. The high interclass and low intraclass similarity demonstrate the efficiency of the proposed multimodal framework, achieving a detection accuracy of 97% within smart city environments.

Accuracy

The proposed model uses accuracy as a basic metric to assess its ability to match the user correctly. The equation for accuracy is mentioned below as Eq. 21, followed by the accuracy comparison with some existing models in Table 5; Figure. 9 defines the accuracy comparison with some standalone DL models.

$$\:\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}\:=\:\frac{True\:Negative\:Values+True\:Positive\:Values}{True\:Negative\:Values+False\:Negative\:values+True\:Positive\:Values+False\:Positive\:values}$$

(21)

Table 5 Accuracy comparison.

Full size table

Precision

Precision is used to calculate the total number of identification processes. The proposed model is correct based on calculating the mean of an accurate optimistic prediction. The precision equation given in Eq. 22, compare it to some existing models in Table 6, and define the precision comparison with some standalone DL models in the Figure. 10.

$$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\:=\:\frac{true\:positive\:values}{true\:positive\:values+false\:positive\:values}$$

(22)

Table 6 Precision comparison.

Full size table

Recall

The proposed model utilizes Recall, a fundamental metric, to accurately identify all positive facial instances of the user and achieve higher efficiency. The recall equation, as stated in Eq. 23, is compared with some existing models in Table 7, and a sensitivity comparison is defined with some standalone DL models in Figure. 11

$$\:\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}\:=\:\frac{true\:positive\:values\:}{true\:positive\:values+false\:negative\:values}$$

(23)

Table 7 Recall Comparison.

Full size table

AUC

The AUC score is used to evaluate the performance of the classifier under the receiver operating characteristic curve. The curve defines the actual positive rate as the recognition of the same person’s face and the false positive rate as a different spoofed image, evaluated under various thresholds. At the same time, Figure. 12 illustrates the AUC comparison with some standalone DL models.

F1 score

Based on false positives and false negatives, the F1 score assesses the system’s overall performance, facilitating an adequate evaluation of the proposed model’s efficiency. The F1 equation, as stated in Eq. 24, is used to compare with existing models in Table 8 and to define the F1 Score comparison with standalone DL models in Fig. 13.

$$\:\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}\:=\:2\:\text{*}\:\frac{Recall\:\text{*}\:Precision}{Recall+Precision}$$

(24)

Table 8 F1-score Comparison.

Full size table

Experimental validation

The proposed model is validated using other datasets, such as Reply-Attack and MSU-MFSD, and the anti-spoof test will be conducted at least once successfully with the standard protocol. Each protocol is used to test with a recommended dataset, with a split test allocated to a cross-dataset for evaluation to ensure generalizability. Training is conducted on the CelebA dataset, and testing is performed on the Reply Attack dataset for cross-dataset anti-spoof robustness. Validation Fig. 14 demonstrates that the proposed model achieves a low equal error rate (EER) with high discrimination on spoofed data samples compared to models like CNN, ResNet-50, and FASNet. The proposed model achieves a low Bonafide presentation and Average classification error rate (BPCER/ACER). This indicates that the integration of the proposed model and feature regularisation effectively reduces both false positives and negatives. The overall result and AUC score on various diverse dataset validations of the proposed model are effectively managed, demonstrating stability and computational efficiency for face anti-spoofing datasets in a real-time facial biometric system.

The ablation study is to evaluate the different model configurations to justify the performance contribution of each module: (1) CNN only, (2) ResNet-50 only, 3.CNN + ResNet-50 without cryptography and 4. Complete the proposed model with all configurations for each test split, using a 95% confidence interval across multiple runs, as expressed in Table 9.

Table 9 Ablation component analysis.

Full size table

Finally, the inference latency analysis is processed using computational distribution across the proposed model, supported by GPU-based feature extraction with CNN and ResNet-50. After the initial extraction with the GPU, the next stage is to process feature extraction with the CPU, enabling lightweight data handling for feature compression, with memory usage at 25.7 MB. Finally, the cryptographic workflow has minimal latency overhead for a 2ms combined model, ensuring that the proposed model does not reduce its real-time performance. The end-to-end encryption/decryption time is 41ms, with a 95% range of 56ms. The system can effectively operate with real-time biometric facial encryption, as shown in Fig. 15.

Proposed model result analysis

The proposed model features practical analysis and classification of spoofed faces. First, the facial matching score is calculated from the feature vector using the multimodal system, clearly distinguishing between a real person’s identity and a fake face. A high score indicates a real face, while a low score indicates a phoney face, as shown in Fig. 16. Principal component analysis is the next step used to create 2-dimensional features that are easier to visualise; the projection of these features in Fig. 17 shows how well the proposed model can distinguish between faces. Finally, a social label is used to identify the feature mapping for prediction validation with an encrypted similarity threshold. This enables the detection of which faces are original and which are spoofed, thereby facilitating secure transactions within the smart city environment, as illustrated in Fig. 18. Figure 19 illustrates the access granted for genuine individuals and the access denied for spoofed faces. This approach ensures that the proposed model enhances security and privacy in facial biometric spoofing detection with high accuracy within an evolving smart city environment.

The proposed multimodal facial authentication framework utilizes the strengths of CNN, ResNet-50, and ElGamal cryptography to achieve high detection accuracy in facial integrity. It prevents data spoofing and ensures security through secure transmission within innovative city environments. The primary CNN layer captures low- to mid-level facial features, including skin tone and facial edges, while the next layer of ResNet-50 extracts high-level semantic features, providing stable training without the gradient vanishing problem. Variation in facial expression is correctly processed with ResNet-50. This combined model enhances the prevention of spoofing attacks by ensuring multi-level feature verification in the biometric security function. Finally, after the post-extraction, the ElGamal cryptosystem encrypts the facial mapping using a combination of asymmetric encryption and digital signatures, ensuring it remains secure and resilient against reverse engineering attacks. With this proposed model framework, the training and testing are processed with the CelebA dataset with different face mapping with expression by achieving a high detection accuracy of 97.1% and a minimum validation loss of 0.04, which outperforms the accuracy percentage of some traditional models like CNN by 1.2% and a Security algorithm by 2.2%. Finally, any model development will have some limitations, as is the case with the proposed model, which has a slightly higher computational load due to the multimodal model architecture pipeline. Additionally, some blurred images require more training for mapping, which increases the time needed to achieve optimal performance.

Proposed model comparison analysis

Performance analysis is a crucial factor to consider when evaluating performance. Discovery and training time are both critical components of efficient time management. The system’s capability, training model, and data processing affect the training time, which differs from the overall time required to complete training on a particular dataset. This methodology has the benefit of providing a longer detection time than conventional approaches. However, the execution time of our approach increases for large datasets. It is critical to distinguish between the search time, which pertains to identifying a possible attack, and the total time necessary to categorise a particular test sample. The capabilities of our system and server determine how quickly our system can launch an assault. In the future, additional trials will be conducted to fine-tune the system settings and boost its efficiency and efficacy. Table 10 mentions a comparison with some existing models and their prediction accuracy.

Table 10 Comparison of proposed system with traditional deep learning model in facial biometric Authentication.

Full size table

The process of fine-tuning is performed using CNN and ResNet-50, a deep learning model, to produce a highly efficient model for facial authentication in smart cities, utilising the CelebA dataset. The fused model is then assessed through testing and training, resulting in an improved prediction accuracy range of 97.1% and a faster regeneration time of 0.62 ms. Additionally, a minimum function loss factor for 50 epochs is used to decrease the likelihood of over gradient vanishing images during the testing phase and reduce the error.

Conclusion and future work

The current era is witnessing a boom in smart city environments, necessitating the implementation of security features to ensure secure transactions throughout the ecosystem. The proposed model of facial encryption, which combines a hybrid CNN and ResNet-50 deep learning model, achieves this. The proposed model enhances the security of the smart city environment, surpassing the traditional security model, and provides a robust defence against unauthorised entry, cyberattacks, and information leaks by protecting user privacy within the smart city limits. The model streamlines the authentication process with smart city monitoring by deploying IP cameras throughout the smart city. The proposed CNN-ResNet-50 facial security authentication model enhances smart city security by safeguarding critical infrastructure during various phases and accelerating the transmission of user-protected information compared to other conventional deep learning models. The proposed CNN-ResNet-50 facial security authentication system for smart cities achieves a high accuracy of 97.1% and retrieves the decrypted data within a time frame of 0.62 ms.

The future work builds upon the current work, which relies on a single dataset, limiting generalization across diverse demographic data. However, futuristic work incorporates multiple datasets with varying poses and lighting conditions. Future work also incorporates advanced cryptographic functions, such as multi-part computation and homomorphic encryption, to preserve high security standards in facial authentication. Apart from face biometric systems, future models may utilize palmprint and iris patterns, which complement each other and provide additional security for the proposed multimodal system. Techniques such as quantization pruning will add additional advantages in futuristic, innovative city environments, with minimal memory consumption and higher accuracy. This integration enhances the efficiency of the proposed work, making it more reliable in the future and effectively applicable in various smart city environments.

Data availability

The datasets used and analyzed during the current study available from the https://github.com/yumingj/Talk-to-Edit. Code available at https://github.com/TECHcode317/facialbio.

References

Facial Recognition Trends for. (2025). https://facia.ai/blog/facial-recognition-trends-for-2025-8-key-innovations-to-watch/ (2025).
Crihan, G., Dumitriu, L. & Crăciun, M. V. Preliminary experiments of a real-world authentication mechanism based on facial recognition and fully homomorphic encryption. Appl. Sci. 14 (2), 718 (2024).
Article CAS Google Scholar
Boddeti, V. N. Secure face matching using fully homomorphic encryption. In 2018 IEEE 9th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–10). IEEE. (2018), October.
Yang, W., Wang, S., Zheng, G. & Valli, C. Impact of feature proportion on matching performance of multi-biometric systems. ICT Express. 5 (1), 37–40 (2019).
Article Google Scholar
Yang, Y. et al. Design on face recognition system with privacy preservation based on homomorphic encryption. Wireless Pers. Commun. 123 (4), 3737–3754 (2022).
Article Google Scholar
Jaswal, G. & Poonia, R. C. Selection of optimized features for fusion of palm print and finger knuckle-based person authentication. Expert Syst. 38 (1), e12523 (2021).
Win, S. S. K., Siritanawan, P. & Kotani, K. Compound facial expressions image generation for complex emotions. Multimedia Tools Appl. 82 (8), 11549–11588 (2023).
Article Google Scholar
Jindal, A. K. et al. Secure and privacy preserving method for biometric template protection using fully homomorphic encryption. In 2020 IEEE 19th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 1127–1134). IEEE. (2020), December.
Sardar, A., Umer, S., Pero, C. & Nappi, M. A novel cancelable facehashing technique based on non-invertible transformation with encryption and decryption template. IEEE Access. 8, 105263–105277 (2020).
Article Google Scholar
Gavisiddappa, G., Mahadevappa, S. & Patil, C. Multimodal biometric authentication system using modified relieff feature selection and multi support vector machine. Int. J. Intell. Eng. Syst. 13 (1), 1–12 (2020).
Google Scholar
Malarvizhi, N., Selvarani, P. & Raj, P. Adaptive fuzzy genetic algorithm for multi biometric authentication. Multimedia Tools Appl. 79 (13), 9131–9144 (2020).
Article Google Scholar
Sharma, N. & Shambharkar, P. G. Transforming security in internet of medical things with advanced deep learning-based intrusion detection frameworks. Appl. Soft Comput. 180, 113420 (2025).
Sharma, N. & Shambharkar, P. G. Multi-layered security architecture for IoMT systems: Integrating dynamic key management, decentralized storage, and dependable intrusion detection framework. Int. J. Mach. Learn. Cyber. 16, 6399–6446. https://doi.org/10.1007/s13042-025-02628-7 (2025).
Sharma, N. & Shambharkar, P. G. Multi-attention deepcrnn: An efficient and explainable intrusion detection framework for internet of medical things environments. Knowl. Inf. Syst. 67, 5783–5849. https://doi.org/10.1007/s10115-025-02402-9 (2025).
Vidya, B. S. & Chandra, E. Entropy based local binary pattern (ELBP) feature extraction technique of multimodal biometrics as defence mechanism for cloud storage. Alexandria Eng. J. 58 (1), 103–114 (2019).
Article Google Scholar
Jagadiswary, D. & Saraswady, D. Biometric authentication using fused multimodal biometric. Procedia Comput. Sci. 85, 109–116 (2016).
Article Google Scholar
Sun, Y., Wang, X. & Tang, X. Hybrid deep learning for face verification. In Proceedings of the IEEE international conference on computer vision (pp. 1489–1496). (2013).
Singh, R. & Om, H. Newborn face recognition using deep convolutional neural network. Multimedia Tools Appl. 76 (18), 19005–19015 (2017).
Article Google Scholar
Hu, H., Shah, S. A. A., Bennamoun, M. & Molton, M. 2D and 3D face recognition using convolutional neural network. In TENCON 2017–2017 IEEE Region 10 Conference (pp. 133 – 132). IEEE. (2017), November.
Wang, F., Cheng, J., Liu, W. & Liu, H. Additive margin softmax for face verification. IEEE. Signal. Process. Lett. 25 (7), 926–930 (2018).
Article ADS Google Scholar
Rajasekar, V. et al. Enhanced multimodal biometric recognition approach for smart cities based on an optimized fuzzy genetic algorithm. Sci. Rep. 12 (1), 622 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Temoshok, D. Digital Identity Guidelines Online, National Institute of Standards and Technology; NIST SP 800-063-4 IPD; NIST:Gaithersburg, MD, USA, (2022).
Alonso-Fernandez, F., Fierrez, J. & Ortega-Garcia, J. Quality measures in biometric systems. IEEE Secur. Priv. 10 (6), 52–62 (2011).
Google Scholar
Ali, M. A., Meselhy Eltoukhy, M., Rajeena, P. P. & Gaber, T. F. Efficient thermal face recognition method using optimized curvelet features for biometric authentication. PloS One, 18 (6), e0287349 (2023).
Morampudi, M. K., Prasad, M. V., Verma, M. & Raju, U. S. N. Secure and verifiable Iris authentication system using fully homomorphic encryption. Comput. Electr. Eng. 89, 106924 (2021).
Article Google Scholar
Singh, K. K. & Barde, S. A feasible adaptive fuzzy genetic technique for face, fingerprint, and palmprint based multimodal biometrics systems. J. Curr. Sci. Technol. 14 (1), 1–15(2024).
Qin, Y. & Zhang, B. Privacy-preserving biometrics image encryption and digital signature technique using Arnold and elgamal encryption. Appl. Sci. 13 (14), 8117 (2023).
Article CAS Google Scholar
Jin, X. et al. Efficient blind face recognition in the cloud. Multimedia Tools Appl. 79, 12533–12550 (2020).
Article Google Scholar
Im, J. H., Jeon, S. Y. & Lee, M. K. Practical privacy-preserving face authentication for smartphones secure against malicious clients. IEEE Trans. Inf. Forensics Secur. 15, 2386–2401 (2020).
Article Google Scholar
Yang, W., Wang, S., Kang, J. J., Johnstone, M. N. & Bedari, A. A linear convolution-based cancelable fingerprint biometric authentication system. Computers Secur. 114, 102583 (2022).
Article Google Scholar
Marcolla, C. et al. Survey on fully homomorphic encryption, theory, and applications. Proc. IEEE. 110 (10), 1572–1609 (2022).
Article ADS Google Scholar
Shafique, K., Khawaja, B. A., Sabir, F., Qazi, S. & Mustaqim, M. Internet of things (IoT) for next-generation smart systems: A review of current challenges, future trends and prospects for emerging 5G-IoT scenarios. Ieee Access. 8, 23022–23040 (2020).
Article Google Scholar
Boukerche, A., Siddiqui, A. J. & Mammeri, A. Automated vehicle detection and classification: Models, methods, and techniques. ACM Comput. Surv. (CSUR). 50 (5), 1–39 (2017).
Article Google Scholar
Selvaraj, S. & Sundaravaradhan, S. Challenges and opportunities in IoT healthcare systems: a systematic review. SN Appl. Sci. 2 (1), 139 (2020).
Article CAS Google Scholar
Fatemifar, S., Awais, M., Arashloo, S. R. & Kittler, J. Combining multiple one-class classifiers for anomaly based face spoofing attack detection. In 2019 International Conference on Biometrics (ICB) (pp. 1–7). IEEE. (2019), June.
Ahmad, S. & Fuller, B. Resist: Reconstruction of irises from templates. In 2020 IEEE International Joint Conference on Biometrics (IJCB) (pp. 1–10). IEEE. (2020), September.
Mwema, J., Kimwele, M. & Kimani, S. A simple review of biometric template protection schemes used in preventing adversary attacks on biometric fingerprint templates. Int. J. Comput. Trends Technol. 20 (1), 12–18 (2015).
Article Google Scholar
Qin, S., Zhu, Z., Zou, Y. & Wang, X. Facial expression recognition based on Gabor wavelet transform and 2-channel CNN. Int. J. Wavelets Multiresolut Inf. Process. 18 (02), 2050003 (2020).
Article Google Scholar
He, L., Li, H., Zhang, Q. & Sun, Z. Dynamic feature matching for partial face recognition. IEEE Trans. Image Process. 28 (2), 791–802 (2018).
Article ADS MathSciNet Google Scholar
Puspita Majumdar, A., Agarwal, R., Singh & Vatsa, M. Evading face recognition via partial tampering of faces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni tion Workshops, pages 0–0, (2019).
Li, H., Ota, K. & Dong, M. Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Netw. 32 (1), 96–101 (2018).
Article ADS Google Scholar
Xiao, L., Wan, X., Lu, X., Zhang, Y. & Wu, D. IoT security techniques based on machine learning: how do IoT devices use AI to enhance security? IEEE. Signal. Process. Mag. 35 (5), 41–49 (2018).
Article ADS Google Scholar
Alharbi, S. et al. Secure the internet of things with challenge response authentication in fog computing. In 2017 IEEE 36th international performance computing and communications conference (IPCCC) (pp. 1–2). IEEE. (2017), December.
Popescu, C. A secure and efficient payment protocol based on elgamal encryption cryptographic algorithms. Electron. Commer. Res. 18, 339–358 (2018).
Article Google Scholar
Shahzadi, R., Anwar, S. M., Qamar, F., Ali, M. & Rodrigues, J. J. Chaos based enhanced RC5 algorithm for security and integrity of clinical images in remote health monitoring. IEEE Access. 7, 52858–52870 (2019).
Article Google Scholar
Rachmawati, D., Budiman, M. A. & Saffiera, C. A. An implementation of elias delta code and ElGamal encryption algorithm in image compression and security. In IOP Conference Series: Materials Science and Engineering (Vol. 300, No. 1, p. 012040). IOP Publishing. (2018).
Imran, O. A., Yousif, S. F., Hameed, I. S., Abed, W. N. A. D. & Hammid, A. T. Implementation of El-Gamal algorithm for speech signals encryption and decryption. Procedia Comput. Sci. 167, 1028–1037 (2020).
Article Google Scholar
Dissanayake, W. D. M. G. M. An improvement of the basic El-Gamal public key cryptosystem. Int. J. Comput. Appl. Technol. Res. 7 (2), 40–44 (2018).
Google Scholar
Yousif, S. F., Abboud, A. J. & Radhi, H. Y. Robust image encryption with scanning technology, the El-Gamal algorithm and chaos theory. IEEE Access. 8, 155184–155209 (2020).
Article Google Scholar
Cahyono, F., Wirawan, W. & Rachmadi, R. F. Face recognition system using facenet algorithm for employee presence. In 2020 4th international conference on vocational education and training (ICOVET) (pp. 57–62). IEEE. (2020), September.
Masud, M. et al. Deep learning-based intelligent face recognition in IoT-cloud environment. Comput. Commun. 152, 215–222 (2020).
Article Google Scholar
Rukhiran, M., Netinant, P. & Elrad, T. Effecting of environmental conditions to accuracy rates of face recognition based on IoT solution. J. Curr. Sci. Technol. 10 (1), 21–33 (2020).
Google Scholar

Download references

Funding

Open access funding provided by Symbiosis International (Deemed University).

Author information

Authors and Affiliations

School of Computing Science and Engineering, VIT Bhopal University, Bhopal-Indore Highway, Kothrikalan, Sehore, 466114, Madhya Pradesh, India
Aanjankumar Sureshkumar
Department of Information Technology, KPR Institute of Engineering and Technology, Coimbatore, India
Malathy Sathyamoorthy
Symbiosis Institute of Computer Studies and Research (SICSR), Symbiosis International (Deemed University), Pune, India
Rajesh Kumar Dhanaraj
Department of Computer Applications, Alagappa University, Karaikudi, 630003, India
S. Aanjanadevi & V. Palanisamy

Authors

Aanjankumar Sureshkumar
View author publications
Search author on:PubMed Google Scholar
Malathy Sathyamoorthy
View author publications
Search author on:PubMed Google Scholar
Rajesh Kumar Dhanaraj
View author publications
Search author on:PubMed Google Scholar
S. Aanjanadevi
View author publications
Search author on:PubMed Google Scholar
V. Palanisamy
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, M.S. and A.S.; Methodology, A.S; Software, A.S.; Formal analysis and Validation, R.K.D.; Writing—Original Draft Preparation, S.A.D & V.P.; Writing—Review and Editing, M.S.; Supervision, Project Administration and Funding Acquisition R.K.D.

Corresponding author

Correspondence to Rajesh Kumar Dhanaraj.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sureshkumar, A., Sathyamoorthy, M., Dhanaraj, R.K. et al. Secure facial biometric authentication in smart cities using multimodal methodology. Sci Rep 15, 44839 (2025). https://doi.org/10.1038/s41598-025-29048-5

Download citation

Received: 29 May 2025
Accepted: 13 November 2025
Published: 29 December 2025
Version of record: 29 December 2025
DOI: https://doi.org/10.1038/s41598-025-29048-5

Subjects

Abstract

Similar content being viewed by others

Smart city energy efficient data privacy preservation protocol based on biometrics and fuzzy commitment scheme

Study of image sensors for enhanced face recognition at a distance in the Smart City context

RF sensing enabled tracking of human facial expressions using machine learning algorithms

Introduction

Motivation of the paper

Organization of the paper

Related works

Materials and methods

Materials

Dataset

Data pre-processing

Methods

Detection and recognition of face

First layer -convolutional neural network (CNN)

Step 1

Step 2

Step 3

Step 4

Step 5

Fused deep learning model

Cryptographic work flow

Analysis of limitation of algorithms

Computational and scalability limitation

a) elgamal’s computational overhead

b) scalability and key management

Experimentation and results

Experimental setup

Selection of dataset attributes

Generalisability of results

Proposed modal testing and training

Performance metrics

Confusion matrix

Accuracy

Precision

Recall

AUC

F1 score

Experimental validation

Proposed model result analysis

Proposed model comparison analysis

Conclusion and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links