Introduction

Artificial intelligence (AI) refers to creating algorithms, processes, and computer programmes that can perform tasks and exhibit behaviours, such as learning, making decisions and predictions, without every step in the process being explicitly programmed by a human. The use of AI in health has traditionally focused on processing of large volumes of data, such as medical images and structured datasets1,2,3,4. Recent advancements in generative AI and natural language processinghave extended this computing ability to manage and interpret unstructured text data, such as those found in chatbots, clinical notes, and social media posts5,6 Such applications, combined with the growth in use of mobile devices, have stimulated interest in harnessing AI for broader health needs7,8,9,10,11,12,13.

The application of AI in sexual and reproductive health (SRH) represents an emerging field with potential to enhance healthcare delivery and access14,15,16,17,18,19,20,21,22. Health domains within SRH encompass topics such as contraception/family planning, infertility and fertility care, maternal health, sexually transmitted infections (STIs) including HIV, comprehensive abortion care, sexual health, and gender-based violence, as derived from the World Health Organization compendium of interventions for advancing universal health coverage23. These health topics are also ones often faced by populations in vulnerable situations or subjected to stigma24. As such, AI could be critical to expanding SRH access to populations underserved by traditional health service delivery mechanisms, but it could also put individuals in vulnerable situations at greater risk depending on how the AI is developed, regulated, and deployed.

Despite the growing body of literature on the use of AI in SRH, existing work either focuses on specific health domains, application types (e.g., chatbots, AI-assisted medical devices)17 or are exploratory assessments on the potential opportunities14,15,24. While this provides a valuable starting point, it also underscores the need for a systematic effort to organize the available evidence across this field. This scoping review examines the landscape of how AI is concretely being applied in SRH and encompasses a wide range of SRH topics, with no limitations on geographical coverage, allowing for an extensive and comprehensive overview of the field. By synthesizing data from a wide array of studies, this review seeks to provide a foundational understanding to inform research gaps and guide policymaking to ensure the effectiveness and equitable use of AI across SRH.

Results

Study characteristics

Our search retrieved a total of 12,823 citations. After the removal of 196 duplicates, both manually and through Covidence, 12,627 unique articles remained for title and abstract screening. Of these, 2666 studies were included for data extraction after fulfilling the inclusion criteria in the full-text screening (Fig. 1).

Fig. 1
Fig. 1
Full size image

PRISMA flow diagram.

The highest proportion of studies was published in 2022 (n = 631, 23.7%), with the majority (n = 1628, 61.1%) published since 2021, while the oldest ones dating back to 198925,26. The majority of studies were conducted in high-income countries (n = 1344, 49.9%), followed by upper-middle-income countries (n = 964, 35.9%). The largest contribution of studies came from China (n = 662), followed by the United States (n = 523), India (n = 181), and the United Kingdom (n = 107) (Fig. 2).

Fig. 2: Geographical distribution of published studies on AI and SRH.
Fig. 2: Geographical distribution of published studies on AI and SRH.
Full size image

Darker shades of blue represent greater number of studies.

The majority of included articles were validation studies (n = 1704, 63.9%), which described the process of developing AI models and testing their performance using subsets of the same dataset against a reference, such as standard diagnostic devices, physician decisions, or other AI models applied to the same dataset27,28,29,30. Observational studies accounted for 31.7% (n = 845) of the total and focused on applying AI models, mainly machine learning, from retrospective data for descriptive and inferential analyses, such as identifying linkages between environmental exposures and SRH outcomes or identifying patterns and risk factors31,32,33,34,35,36. A small portion of studies (n = 59, 2.2%) were sentiment analysis, using natural language processing (NLP) to extract text from social media platforms like Twitter (now X) to explore attitudes towards SRH topics, such as HPV vaccination, abortion37,38,39,40,41,42,43. Experimental studies (n = 43, 1.6%), in which there was an introduction of an AI intervention to different participant groups, were relatively limited and focused on evaluating the use of chatbots on topics, such as fertility support and preconception care to the intervention group, while the control group received either standard care or no intervention44,45. Qualitative studies (n = 13, 0.4%) used methods such as focus group discussions, interviews, user testing, and participatory design approaches to understand the feasibility, acceptability, and user experience of AI tools across different SRH domains, including contraception, HIV prevention, and SRH education46,47,48. Economic evaluations (n = 2, 0.08%) used AI to estimate cost inputs and evaluate the cost-effectiveness of SRH interventions, such in vitro fertilization procedures and HIV prevention programmes49,50.

Health domains and AI intended purpose

Maternal health emerged as the most represented SRH area, comprising 44.9% (n = 1198) of studies. Within this health domain, AI demonstrated diverse applications, with its predominant use being for screening purposes (n = 954, 80%), followed by understanding health trends (n = 96, 8%). This included predicting maternal complications (n = 392, 32.7%)51,52,53,54, such as postpartum haemorrhage, gestational diabetes, preeclampsia, as well as foetal and neonatal complications (n = 323, 26.9%)55,56,57, such as foetal growth restriction and spina bifida. AI was also employed in routine antenatal care (ANC) for monitoring maternal and foetal biometric and physiological measurements, such as gestational age and foetal weight58,59,60. During labour, AI applications extended to predicting the mode of delivery and supporting physicians through clinical decision support systems (CDSS)61,62,63. Machine learning was also commonly applied to understand health trends by analysing implications of maternal behavioural and environmental exposures, such as nutrition, smoking, and pollutant exposure, and their associations with pregnancy and birth outcomes64,65,66,67. For postnatal care, AI was used to evaluate mental health outcomes, particularly postpartum depression, and other mental health disorders68,69,70,71,72,73,74. These studies utilized various data sources, including electronic medical records (EMRs), research repositories with clinical parameters, and social media posts, to assess risk and provide early intervention68,69,70,71,72,73,74. AI modelling techniques were also used in clinical research and drug discovery in 5% (n = 62) of maternal health studies, to identify drug toxicity and biomarkers for pregnancy and foetal conditions75,76,77,78 (Fig. 3).

Fig. 3: Intersection of SRH domain and AI intended purpose.
Fig. 3: Intersection of SRH domain and AI intended purpose.
Full size image

Darker shades of blue represent higher concentration of studies, such as screening and diagnosis for maternal health and reproductive organ cancers, whereas lighter shades of yellow indicate fewer numbers of studies.

Cervical cancer, as a specific gynaecological/reproductive cancer, represented 27.7% (n = 741) of all the studies, in which the primary intended purpose was screening and diagnosis (n = 612, 82.5%), followed by treatment and care management (n = 88, 11.8%)79,80,81,82. Other types of reproductive organ cancers, such as uterine, prostate, and ovarian cancers, comprised 1.4% (n = 38) of all studies. Many of the investigations utilized ultrasound and MRI data to develop diagnostic models capable of distinguishing benign from malignant masses and performing image segmentation for more precise diagnoses83,84. Across all reproductive organ cancers (n = 779, 29.2%), the primary outcomes focused on risk assessment85, cancer detection86, and disease prognosis87,88. Additionally, AI was used in radiation therapy and brachytherapy to enhance treatment precision by targeting affected areas, dynamically adjusting dosages, and projecting patient prognosis for clinical decision-making80,89 (Fig. 3).

In infertility and fertility care (n = 337, 12.6%), AI applications were prominent in supporting various steps of assisted reproductive technology, with AI being utilized in both laboratory set-ups and clinical care settings90,91,92,93,94,95,96,97,98. These ranged from the early stages of embryo selection, where machine learning techniques were employed to evaluate embryo quality and viability99,100, to predicting assisted reproductive technology outcomes by training models on selected clinical and laboratory features101,102. Assessment of fertility parameters, encompassing investigations of biological factors such as hormonal levels, reproductive organ health, formed the largest subset of outcomes (n = 182, 54%)98,103,104,105. Technologies such as computer-assisted sperm analysis, which traditionally automate sperm motility and morphology measurement, are now being augmented with AI models, such as You Only Look Once (YOLO). These AI-powered approaches enable advanced object detection and are applied to evaluate sperm parameters with greater speed and accuracy90,91. The primary data sources for training and applying the AI models were diagnostic images (n = 92, 27.2%) and laboratory specimens (n = 66, 19.5%).

The application of AI in HIV and other STIs encompassed 259 studies (9.7%), focusing primarily on risk assessment and early detection (n = 86, 33.2%) within key populations106,107, such as men who have sex with men as well as on predicting HIV-related complications (n = 105, 40.5%)108,109. In the context of HIV care and treatment using antiretroviral therapy (n = 25, 9.6%), AI models were developed for drug monitoring and toxicology predictions. Data sources for the HIV studies predominantly included EMRs with clinical patient-level information (n = 75, 28.9%), along with research repositories (n = 41, 15.8%), and survey data (n = 40, 15.4%).

Fewer studies were identified on other SRH domains, including intimate partner and sexual violence, sexual health, contraception, and abortion. For intimate partner and sexual violence (n = 23, 0.8%), AI was used primarily for online data mining (n = 12, 52.1%) to perform risk assessments and detection110,111, identify gender-based violence (GBV) narratives112, and evaluate legislation related to domestic violence113 as a way to better understand reporting trends and attitudes across online GBV discussions. In contraception and family planning (n = 22, 0.8%): AI was applied to gain insights into fertility and family planning usage on a broad scale. Studies utilized population-level surveys113,114 to capture overarching trends and patterns, while online data were employed to gauge public attitudes and perceptions of various contraceptive methods. The use of AI in sexual health (n = 12, 0.4%) was mainly through chatbots to provide health information115,116, as well as through machine learning models that mined social media data to analyse youth communication patterns and trends in sexual health discussions and risk behaviours117. For menopause (n = 9, 0.3%), AI was applied to forecast conditions associated with pre-menopausal and postmenopausal stages, including osteoporosis and endometrial alterations118,119,120. Additionally, AI was leveraged to assess healthcare behaviours and address the physiological requirements of menopausal women121. In abortion (n = 6, 0.2%), AI was employed to examine public discourse about abortion on social media platforms such as X (formerly Twitter)122,123,124, as well as using natural language processing to categorize abortion bills’ wording as restrictive or protective40.

Target end-users and populations of interest

Healthcare providers (n = 1792, 67.2%) emerged as the primary target end-users where AI was used to facilitate screening and diagnosis, primarily by identifying risk factors, or assist in care provision such as treatment options for HIV and in vitro fertilization125,126,127,128,129,130,131,132,133,134,135,136. Researchers (n = 628, 23.5%) were the second-largest end-users, where they used AI to analyse health trends using research repositories, social media data137,138, and to advance clinical and pharmacological research139,140,141. Health service users/clients (n = 146, 5.4%) were direct beneficiaries of AI-driven tools, such as conversational agents (i.e., chatbots) for accessing tailored health information115,142,143. Additionally, sensors and wearables were used for personal health monitoring, including tracking ovulation and monitoring foetal health at home144,145,146. The underlying populations within the health service users include women (women of reproductive age, postpartum, menopausal, and postmenopausal, overall n = 1106, 41.5%), pregnant women (n = 852, 32%)%), foetus (n = 251, 9.4%), people living with HIV (n = 172, 6.5%), and men (mainly involved in fertility care = 152, 5.7%), where semen samples often served as the unit of analysis147,148.

AI model development and lifecycle

The main data sources consisted of images derived from diagnostic devices, such as ultrasounds and MRI (n = 928, 34.8%); EMRs (n = 519, 19.5%); laboratory reports, including genomic and molecular data (n = 368, 13.8%); research repositories and registries for secondary data analysis (n = 335, 12.6%); signal data (n = 172, 6.5%) obtained through foetal electrocardiograms (ECG), cardiotocography (CTG), photoplethysmography (PPG), and sensors; and surveys (n = 154, 5.8%). Laboratory data included the examination of blood plasma for various biomarkers, semen analysis for fertility assessments, and amniotic fluid evaluation. The remaining data was derived from audio recordings, mobile applications collecting behavioural or user-reported data, such as fertility monitoring, self-reported questionnaire responses, regulatory authorities, and social media posts41,149,150,151. The data types used were varied, ranging from structured clinical data, such as patient records and diagnostic results, to unstructured data, including free-text online content and other narrative data, which were analysed to capture user experiences, emotions, and sentiment.

Actual model deployment demonstrating real-world applications was observed in 6.4% (n = 171) of all publications, such as in the introduction of conversational agents, understanding of population-level trends, or use in clinical environments for screening and clinical decision support systems152,153,154,155. The majority of studies were in the model evaluation stage (n = 1946, 72.9%). These were often conducted as validation studies to assess the performance of algorithms prior to deployment or observational studies to determine if machine learning models could accurately make predictions based on existing datasets92,156,157. Such studies employed training, validation, and test sets to develop and evaluate their AI models, utilizing performance metrics such as precision, recall, accuracy, and other relevant indicators to ensure reliability. However, external validation, which involves testing the models on entirely independent datasets from different sources, is not explicitly described in most of the studies. The next largest category comprised of studies in the earlier stage of model development (n = 549, 20.5%), with the primary objective of designing and training AI models or identifying key predictive features158,159,160,161,162,163.

Comparators often included traditional diagnostic methods (e.g., physician decisions, laboratory tests, or imaging modalities) and other AI models, providing benchmarks to evaluate improvements in efficiency and accuracy. The outcomes varied by study, with AI models frequently demonstrating superior performance in areas like predictive accuracy and diagnostic efficiency, though some studies highlighted challenges, such as overfitting or limited generalizability due to small sample sizes or lack of external validation. One study reported that AI performed slightly worse than the traditional predictive model against which it was compared156. We did not compare the performance of AI models across studies due to the heterogeneity in SRH domains, the nature of AI applications, and variations in data sources, model designs, and evaluation methods, which made cross-study comparisons unfeasible.

Discussion

The findings of this scoping review highlight the diversity of ways AI is used and studied in SRH, providing a comprehensive foundation to spur further analysis, research prioritization, and policy development. While AI was used across all country income categories, most studies were concentrated in high- and upper-middle-income countries, underscoring the uneven uptake in settings where SRH challenges are often most acute. A contributing factor of this imbalance is the greater availability of datasets from research registries and academic institutions in high-income settings, partly due to the more advanced digital infrastructure and greater availability of structured datasets for AI modelling13. Our analysis, which captured both the origin of datasets and their deployment, revealed a trend of ad hoc machine learning analyses on convenient datasets, drawn primarily from academic institutions in upper-income settings, often without a clear pathway toward real-world implementation. For example, several studies with authors in low- and middle-income countries leveraged datasets from research registries and academic institutions in high-income countries to develop and test AI models, highlighting the need for datasets that can enable the use of AI in local contexts. However, some studies also explicitly aimed to address this disparity by validating AI tools for use in rural or low-resource settings, such as exploring the use of image analysis on smartphones to offer affordable diagnostic devices, including ultrasounds and HIV viral load detection164,165,166,167,168,169,170,171,172,173.

Generally, in lower and lower-middle-income countries, AI applications were largely geared towards prediction tools addressing health system needs such as contraceptive service use, comprehensive HIV prevention and treatment, and maternal risk assessment174,175,176,177. In contrast, the use of AI in high-income settings extends to include advanced biomedical discovery and analyses that leverage imaging, molecular, and other complex datasets across multiple disciplines (e.g., environment and health)178,179,180. China, an upper-middle-income country with the highest number of studies, demonstrated a mixed pattern, applying AI both to conventional maternal health risk prediction and more advanced diagnostic applications using image analysis and unsupervised learning181,182.

Across all settings, the review identified a strong emphasis on AI for screening and diagnosis, particularly in maternal health and cervical cancer. This may reflect the suitability of traditional AI approaches, such as risk prediction and classification, as well as the availability of diverse data sources, including images from ultrasounds, signals from ECGs and sensors, and electronic medical records1,183,184. These findings align with other analyses highlighting the potential of AI for diagnostic tools and personalized treatment14,185. This review also identified less documented uses of AI in SRH, such as for drug discovery, understanding people’s views on highly debated topics, including the HPV vaccine, abortion, contraception, and conducting epidemiological analysis for correlations between exposures and SRH outcomes.

With the majority of studies in the model development and evaluation stages, experimental studies of deployed AI tools and economic evaluations were notably scarce within the reviewed literature, restricting insights into their real-world applications and scalability. These findings resonate with observations in the broader field of AI in health, where experimentation is rapidly growing, yet real-world AI implementation in clinical practices is not widespread5. This absence of external validation further limits the generalizability of AI models to broader populations and diverse settings, and potentially yield tools that are not vetted for use outside of research settings or in new contexts. In addition, the heterogeneity and lack of transparency in reporting data sources further undermined the robustness and scalability of some of the identified studies. With the development of frameworks such as SPIRIT-AI and CONSORT-AI, reporting should become more standardized and could be expanded to guide authors on key ethical considerations for reporting186,187. For example, although this review sought to extract data on adverse effects and ethical considerations, studies did not report these systematically or explicitly. Considering that SRH is often politicized and the use of AI in this field is fraught with concerns to uphold privacy, bodily autonomy, and rights14,24, a deeper analysis of the ethical implications of included studies can advance understanding of the overall use of AI in sexual and reproductive health and rights (SRHR).

This is one of the first scoping reviews to comprehensively map the literature of AI in SRH. One of the key strengths of this review is the extensive body of evidence and literature synthesized, spanning 2666 studies across all SRH domains and geographic locations. The review adhered to rigorous methodological standards, including registration of the protocol before starting the overview, we designed a sensitive search strategy that was run in several electronic databases without date or language restrictions. It also serves as a foundation for conducting sub-analyses across SRH domains and AI functionalities, while also providing a taxonomy for the classification of AI studies that can be broadened for use beyond SRH. Furthermore, the technical consultation of AI and SRH experts enriched the development of key themes and patterns in the formulation of the underlying taxonomy and data charting process22.

Despite its strengths, this review has several limitations. The heterogeneity in study designs and volume of studies provided valuable breadth, though it limited the ability to draw uniform conclusions on effect, which requires more extensive in-depth analyses across subgroups. The comprehensive inclusion criteria were consistent with scoping review methodology; however, future research efforts, particularly systematic reviews or meta-analyses, will need to differentiate across levels of study quality and maturity. In addition, the exclusion of HIV genomic and drug discovery may underestimate the level of AI investments and research in HIV. Lastly, the rapidly evolving pace of AI research and the extent of studies captured requires more efficient ways of continuously synthesizing the literature, including by leveraging AI to expedite updates to this review.

This structured synthesis distils the concrete uses of AI in SRH, highlights key patterns, and pinpoints critical areas for future research and impact. While substantial progress has been made in certain domains, significant gaps remain in geographic representation, methodological rigor, and deployment studies with external validity. Addressing these challenges requires a concerted effort to prioritize rigorous validation, equity, and sustainability for advancing the use of AI for SRH impact and fostering health equity and rights.

Methods

Identifying the research questions

The review was based on a published protocol and conducted using the established methodological framework by Arksey and O’Malley188,189. The PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) format was used to report the findings (Supplemental Note 1).

The primary research question for this scoping review was to identify the characteristics of AI systems and tools being applied to SRH, in terms of health domains, intended purpose (e.g., screening, counselling, understanding health trends, health promotion and counselling), geographic scope, target users, implementation maturity, data sources, and outcomes of interest189. SRH domains were derived from the World Health Organization (WHO) Universal Health Coverage (UHC) Compendium and comprise maternal and perinatal health, contraception and family planning, infertility and fertility care, sexual health, female genital mutilation, intimate partner and sexual violence, reproductive cancers, sexually transmitted infections, including HIV23. Within HIV, genomic studies were excluded from this review due to their complexity and narrower focus on genomic research rather than the broader sexual and reproductive health dimensions of HIV; HIV studies focused on service delivery and access to care were included. There was no restriction on the language of the study, and where needed, articles were translated into English using Google Translate. The full version of the search strategy can be found in Supplemental Note 2.

Identifying relevant studies

We conducted searches across four electronic databases: MEDLINE (PubMed), Scopus, Web of Science, and CINAHL, covering all available studies up to October 2023. No start date was imposed for the search, allowing for the inclusion of all available studies regardless of their publication date. Table 1 outlines the inclusion and exclusion criteria for the studies. We included all primary research studies reporting on AI applications to aspects of SRH, provided they had a clearly described methodology, encompassing quantitative studies, qualitative studies, as well as programme evaluations and descriptions.

Table 1 Inclusion and exclusion criteria used in this scoping review

Study selection

A web-based tool for article screening and data extraction, Covidence (https://www.covidence.org/), was used to manage the screening and data extraction process (Table 1). Titles and abstracts were double screened across six independent reviewers (A.F., A.P.B., S.M., M.B., C.M. and T.T.), and any conflicts were resolved through discussion with screeners or by a third author. Articles that qualified for full-text review were independently double-screened across nine independent reviewers (S.M., T.T., M.A., G.M.P., G.M., S.T.N., R.W., J.D. and G.D.). No AI tools were employed at any stage of this review, including during the screening and data extraction processes.

Charting the data

A standardized chart for data extraction was developed consisting of the following data for each study: title, author, year, geographical location, study design, methodology, objectives, sample size, setting, SRH domain, AI intended purpose, target population, participant type, AI lifecycle, source of data, type of dataset, outcome, comparator, and any adverse effects or ethical considerations. Three authors (S.M., S.P. and T.T.) manually extracted data from all full-text articles and discussed the emergent responses. Twenty percent of the extracted data was rechecked (TT) for quality assurance and standardization (Table 2).

Table 2 Characteristics of included studiesa

For the study design, we categorized studies into the following categories:

  • Validation studies, defined as studies that compare the accuracy of a measure with a gold standard or reference measure190.

  • Observational studies, which applied AI to analyse large datasets and draw inferences on the effects of an “exposure” or intervention191. These studies often leveraged surveys, case-control, and cohort designs and applied machine learning, a subset of AI, which focuses on the use of statistical and mathematical modelling techniques to define and analyse data192.

  • Experimental studies, which aim to assess the effects of an AI intervention that has been intentionally introduced on an outcome of interest (e.g., randomized controlled trials and quasi-experimental studies)193.

  • Qualitative studies, often conducted through interviews and focus group discussions, to explore the acceptability and feasibility of AI systems and tools.

  • Content and sentiment analysis, which includes the use of natural language processing to systematically identify, extract, quantify, and examine patterns of information194,195. Sentiment analysis is a subset of content analysis in which the focus is on mining data to explore attitudes, beliefs, and opinions194,195. Although these studies may also qualify as using qualitative methods, we included this as a separate category due to the specific way AI and natural language processing are applied to analyse unstructured text.

The intended AI purpose was based on a predefined classification developed by WHO through a consultative process used to develop a technical brief on the role of AI in sexual and reproductive health and rights (SRHR)22. Building off existing digital health classification frameworks196, the target population described the intended end-users of the AI tools across the following categories: clients/health service users, healthcare workers, health system managers responsible for oversight at facility levels, researchers, and policy makers. The participant type described the population of interest or research subjects, such as foetuses, pregnant women, adolescents, people living with HIV, women of reproductive age, men or women engaging with SRH services.

Collating, summarizing, and reporting the results

The extracted data were systematically organized across several key categories using tables to summarize observed patterns and accompanied by a narrative synthesis. In addition, we used the World Bank country classification for 2024197 to consolidate countries/geographic locations into country income groups.

We applied the AI lifecycle framework198 to classify and analyse the maturity of studies. The framework outlines five stages: data creation, data acquisition, model development, model evaluation, and model deployment, emphasizing a continuous process from data collection, pre-processing, to the training, validation, and application of AI models in real-world contexts198. The following definitions were used to delineate the stages across the AI lifecycle198:

  • Model development: Studies that primarily describe the process of formulating the algorithms and building an AI model. These studies may also include preliminary testing to validate the AI models, but the core focus of the results is the developmental process.

  • Model evaluation: Studies that focus on testing the performance and efficacy of AI models within a controlled research setting. These studies may include the AI model development process as part of the methodology, but the core focus of the results is the AI model’s performance.

  • Model deployment: Studies that introduce the use of AI models in real-world implementations and able to draw conclusions that extend beyond the performance of the AI model.

Data sources and types of datasets used in the studies were summarized to evaluate the nature and origin of the data underpinning the AI models. The study outcomes were categorized to highlight the range of specific areas covered under each SRH domain. Considering the lack of a taxonomy on AI data sources, we applied an iterative process to develop categorizations based on the first 100 extracted studies and refined over the course of the data extraction process.

Consulting stakeholders

We consulted stakeholders from the WHO technical expert group on AI for SRHR for feedback on classifying the functional purpose of the AI system and tools. This technical expert was convened at the inception of the scoping review to inform the development of a technical brief that would leverage findings from this scoping review. The early inputs from this consultative process were used to refine the research questions and data extraction needs.