Introduction

Pre-introduction

Urban environments significantly influence crime patterns, yet previous studies have primarily categorized urban locations based on general crime, with limited focus on sexual violence crimes. The spatiotemporal heterogeneity of sexual violence remains underexplored, as existing research often examines either spatial or temporal crime variations in isolation, failing to account for their combined effects. Moreover, the explainability of macro-level Built Environment (BE) attributes (e.g., land use, road networks) and micro-level Street Environment (SE) attributes (e.g., trees, cars, streetlights) on sexual violence has not been systematically compared.

To address these gaps, this study categorized Points of Interests (POIs) into crime defense, attraction, generation, and neutral categories, drawing on established criminological theories to refine their criminogenic roles in the context of sexual violence. Using Geographically Weighted Regression (GWR) and Geographically and Temporally Weighted Regression (GTWR), this study examined the spatio-temporal heterogeneity of BE, SE impact on sexual violence crimes in Mumbai. Furthermore, it compares the effects of BE and SE on sexual violence crimes to help decision-makers prioritize urban planning tasks. By integrating multi-scale urban attributes with machine learning-driven visual analysis, this research offered crucial insights into crime-environment correlations, guiding evidence-based urban design and targeted policy interventions for women-friendly cities.

Background and contexts

Sexual violence in India has long been an international concern. It is reported that over 1 out of 6400 women were victims of crime, while the number of recorded rape cases exceeded 3000 in 2021 (Rathore, 2023), which means women’s travel safety in India city is under serious threat (Roy and Bailey, 2021). Apart from socio-demographic factors, urban space, as the main place for residents’ activities, has a huge impact on the occurrence of sexual violence crimes. For instance, open, well-maintained public spaces with adequate facilities may enhance natural surveillance, potentially deterring such crimes (Moffatt, 1983). In contrast, narrow, disorderly, and spatially complex alleys can inadvertently provide concealment, facilitating offenses (Sidebottom et al. 2018). Therefore, it’s crucial for policymakers to understand what contributes to safer urban environments to effectively protect women’s rights in mobility and quality of life.

Existing research has emphasized the associations between the macro-level BE and criminal behaviors, addressing the role of a quality environment in preventing crimes (Bruinsma and Johnson, 2018). For example, the “broken windows theory” (Kelling et al. 2003) emphasizes the signs of neighborhood decline—such as unrepaired broken windows, graffiti, and trash—convey a lack of order and regulation, potentially fostering criminal activity. In contrast, the “defensible space theory” (Ostrom, 1975) discussed how “territorial marking” (e.g., buildings, fences, and sidewalks) can deter crimes by reinforcing a sense of ownership and surveillance. Besides, the “Crime Prevention Through Environmental Design” (CPTED) approach (Moffatt, 1983) aims to improve safety through changes to the BE, such as increasing the openness and lighting of the street. Most recently, advances in Street View Imagery (SVI) and Computer Vision (CV) have enabled large-scale assessments of micro-level SE by extracting key visual elements such as buildings, trees, and sidewalks, which enable the study of the correlation between the micro-level SE and crime patterns (Amiruzzaman et al. 2021; Chen et al. 2013; Clarke et al. 2010; Hipp et al. 2021; Khorshidi et al. 2021; Zhou et al. 2021). Although many studies have been devoted to clarifying the relationship between the BE and crime patterns, two main gaps remain.

First, insufficient evidence has simultaneously demonstrated the spatially and temporally heterogeneous effects of the macro-level BE on gendered violence (particularly related to female victims). Existing research often addressed the spatial (Dev, 2022; Lee and Lee, 2020) or time heterogeneity (He et al. 2020) separately, let alone differentiating the victim’s gender. For instance, spatial features such as poorly lit or isolated areas, and temporal factors such as late hours have been found critical to general crime incidents without differentiating victim genders. We hypothesize that violent crimes targeting women could have unique spatiotemporal patterns due to their more restricted mobility, which necessitates a joint analysis capturing both dimensions concurrently (Zhou et al. 2021).

Moreover, although prior studies have categorized criminogenic mechanism of macro-level BE (land use, traffic, and sociodemographics and/or POIs) into crime generator, attractor (Brantingham and Brantingham, 1995), detractor (Kim and Wo, 2023), and facilitator (Clarke and Eck, 2005) to explain general crime patterns, we argue that such pre-assumptions are not fully applicable to gendered violence. Due to the unique criminal motivations targeting female victims, for example, the presence of stores which has been conventionally depicted as “crime generator” (Brantingham and Brantingham, 1995), may actually reduce sexual violence on women. Therefore, it’s crucial to verify and differentiate the divergent roles of macro-level BE on female-targeted violence compared to those of general crimes.

Second, the roles of micro-level SE and macro-level BE on gendered crime patterns have not been systematically compared. For example, although emerging criminology studies have integrated both macro-level BE variables (e.g., POI density and road network features) with a few micro-level SE variables (e.g., view indexes of building, sky, tree, and the presence of person) to describe the urban scenes comprehensively (Luo et al. 2022; Su et al. 2023; Yue et al. 2022), it is unclear to what extent the above two levels of measurement supplement/conflict each other, let alone how their explainability on crimes diverge. Investigating these regards will provide policymakers with evidence-based urban management strategies to prioritize in facilitating gender-inclusive cities.

The present research

To address these gaps, this study explored the spatiotemporal relationship between the macro-level BE, micro-level SE, and gender-based sexual violence using Mumbai as a case study. The originality of this research includes the following.

First, to explore spatiotemporal variations of crime, Geographically Weighted Regression (GWR) models were constructed over four periods throughout a day (i.e., 00:00–06:00, 06:00–12:00, 12:00–18:00, and 18:00–24:00). Second, this study incorporated macro-level BE and micro-level SE variables into regression models separately or combined to systematically investigate their distinct roles. Third, to enhance the explanatory power of macro-level BE attributes, POIs were categorized into four criminogenic mechanism types in the modeling process: crime defense, attraction, generation, and neutral, to further clarify their impact on women’s safety.

In short, this study analyzed the spatiotemporal correlation between crime incidents and BE, underscoring the complex role of BE features in influencing crime patterns in Mumbai, which provided inspiration for future urban design and refined urban management.

Literature Review

The impact of macro-level BE on urban safety

CPTED theory suggests that urban planning and design can revitalize neighborhood life and reduce crime (Armitage and Ekblom, 2019; Crowe, 1991). For example, natural surveillance (e.g., nightclubs, restaurants, pubs, bars, and buildings facing the street) can serve as the eyes on the street (Jacobs, 1989), playing a crucial role in making places safer. Moffait (1983) proposed six characteristics of the first generation of CPTED: surveillance, movement control, territoriality, maintenance, activity support, and target hardening. Samuel investigated crime prevention through housing design (Samuel, 2014). Newman’s Defensible Space Theory (DST) suggests that spaces with territorial markings, surveillance management signals, and ownership indicators deter crimes (Ostrom, 1975). For instance, installing/improving street lighting reduces crime by strengthening surveillance and enhancing social cohesion (Kim and Park, 2017). Studies have examined the impact of CPTED features (e.g., security systems, signs, personalized items, and landscapes) on strengthening the identity of territoriality and found these features affected the fear of crime and perceived community cohesion (Hedayati Marzbali et al. 2016). Cozens and Love (2015) adapted Moffat’s six principles. They incorporated geographical juxtaposition, recognizing that the surrounding area may affect the ability of security and protection. Crime Pattern Theory (CPT) (Brantingham and Brantingham, 1995) classified spatial environments based on their impact on crime. Crime generators are areas where high pedestrian traffic naturally leads to increased criminal opportunities, such as shopping malls and subway stations. Crime attractors are locations that draw crime due to specific characteristics, such as unattended parking lots or urban boundary areas. Additionally, crime-neutral areas don’t have the first two characteristics. Crimes occasionally occur there, but they lack predictability. Overall, these theories qualitatively explain how environmental variables contribute to general crimes such as theft and homicide. However, whether these conclusions also apply to gender-based violence remains largely unexplored.

The impact of micro-level SE on urban safety

Recently, the widespread coverage of SVI data and the advancement of computer vision have received significant attention in public space research in multiple fields, including housing price, active travel, and quality of life (Dong et al. 2022; Qiu et al. 2023; Song et al. 2024; Zhao et al. 2024). Street view data has become one of the most intuitive, accurate, and effective ways to audit the environment (Kelly et al. 2013) and assess human perceptions (Lou et al. 2024; Wu et al. 2023). Many countries and cities around the world have launched urban-scale SVI datasets. Due to its comprehensive coverage and high accuracy, SVI has become a rapid collection method for high-precision urban appearance data to measure the quality of street space (Raskar et al. 2015; Shen et al. 2017; Wang et al. 2022). In recent years, scholars have used deep learning algorithms such as ResNet and SegNet to identify various visual features in SVIs, laying a solid foundation for more reliable research on design quality, visual perception (He and Li, 2021; Wang et al. 2019; Wu et al. 2019), and livability issues (Song et al. 2023).

For example, Yue et al. (2022) used deep learning fully convolutional image segmentation algorithms to evaluate visual elements such as trees, buildings, and streets in the streets. They explored the relationship between these variables and crime. The study found that areas with more trees are more prone to public theft, while areas with more roads and sidewalks are less likely to experience theft. However, other studies have shown that green areas increase the number of visits by residents, and this increase in natural surveillance reduces crime (Kondo et al. 2017). Zhou et al. (2021) extracted street view features such as traffic signs, terrain, and SVI buildings to evaluate the relationship between the micro-level architectural environment and drug-related activities. Luo et al. (2022) used a deep learning model to characterize architectural environment and found the interaction between it and socio-demographic variables change by seasons which explain the spatial distribution of street crime occurrences. Adachi and Nakaya (2022) found the interaction effect between streetscape elements and locations affects the risk of crime. Even though there has been a surge in recent research using BE and SE variables to study crime, previous studies have not systematically compared the differences in the explainability of SE and BE variables in influencing crime patterns. Figuring out such differences can help urban planners prioritize key tasks in inclusive urban development.

Spatial and temporal heterogeneity of impact factors of sexual violence crimes

Most previous studies have applied regression models or negative binomial regression to study how architectural environmental factors affect sexual violence crimes. However, these global quantitative analysis models cannot capture spatiotemporal non-stationarity. Some studies have attempted to address this problem in the past few years. For example, Su et al. (2023) applied spatial regressions to capture the impact of spatial variables on the volume of sexual violence crimes. However, this study relied on a global regression model, ignoring the impact of spatial heterogeneity. Wang et al. (2019) used GWR to study the spatial and social patterns of property and violent crimes in Toronto communities. Nevertheless, their explanatory variables are social demographic data (age, marriage, etc.), needing more quantitative research on the physical BE. Kim and Wo (2023) modeled every two hours of the day separately to capture the impact of time changes on crime. However, their global regression model does not consider spatial non-stationarity. Table 1 summarizes selected studies.

Table 1 Summary of related studies.

Data and Method

This study collected environmental variables and crime data through multiple sources and developed Ordinary Least Squares (OLS), GWR, and GTWR models considering the variables of micro-level SE and macro-level BE separately or in combination to clarify the impact of different levels of variables on sexual violence crimes while considering spatiotemporal heterogeneity.

Study area

According to the National Crime Records Bureau (NCRB) (https://ruralindiaonline.org/en/, accessed 2022/08/10), Mumbai reported 4583 crimes against women in 2020, which was second among 19 major cities. It also has the most stalking incidents in India. Therefore, 227 administrative units (i.e., citizen constituency) were selected as the study area.

Notably, considering the socio-demographics statistical units and better assisting administrative entities in formulating policies, we divided the area into six zones based on the local administrative area (Fig. 1). To mitigate the impact of varying constituency sizes, the values of variables within each constituency were aggregated by a standard spatial unit (500 M*500 M), and then the averaged values were calculated. 500 M is commonly considered an average neighborhood size, approximately corresponding to ~10 min of walking (Papadikis et al. 2024; Tsai et al. 2021), while the smallest constituency area in Mumbai is also about 250,000M2.

Fig. 1
figure 1

Locations of the sampled districts in Mumbai, Maharashtra, India.

Analytical framework

The analysis consists of three steps. First, crime data on sexual violence against women were aggregated by four evenly distributed time periods as the dependent variable. Second, through semantic segmentation, object detection, and feature point analysis, street view features are quantified as independent variables, together with other socio-demographic data and macro-level BE attributes which were extracted from POI data. Third, OLS, GWR and GTWR models were constructed for each period, and variables’ impact on crimes along with time and location were compared and discussed (Fig. 2).

Fig. 2
figure 2

Analytical framework.

Variables

Data on sexual violence crimes against women

Safecity is a platform that crowdsources personal stories of harassment and abuse in public places with over 25,000 reports by 2021. We extracted geo-tagged reports from 2010 to 2022 on sexual violence against women from the Safecity website (https://webapp.safecity.in/, accessed 2022/08/14), including time, coordinates (latitude and longitude), and the crime type. The categorization into four types was conducted by Safecity, which are verbal abuse like catcalling and inappropriate comments; non-verbal abuse, such as stalking, leering, and indecent gestures; and physical abuse like groping, assault, and other abuse.

We then divided the dataset evenly into four time periods by observing the time distribution of sexual violence crime data (Fig. 3): the pre-dawn time period is from 24:00 to 6:00, when the crime frequency is very low; From 6:00 to 12:00, the crime frequency increases overall but changes evenly within the time period; From 12:00 to 18:00, the crime frequency surges and reaches its peak at 18:00; In the evening period, 18:00 00 to 24:00, when the crime frequency begins to decline rapidly.

Fig. 3
figure 3

Number of crimes over time.

Macro-level BE attributes

Macro-level BE attributes were constructed based on Points of Interest (POI) and road network data. Crime pattern studies (Brantingham and Brantingham, 1995) suggest that the explanatory power of urban functions is more effective when categorized into three criminogenic types (i.e., generation, attraction, and neutrality). Additionally, certain public security management places are believed to prevent criminal incidents (Ostrom, 1975). Therefore, we categorized POIs, including road network density, into four criminogenic types (Appendix Table A1). Specifically, POIs such as police stations and fire stations are classified under the Crime Defense category. Casinos, nightclubs, cybercafes, etc., are categorized as Crime Attractions due to their inherent characteristics that create opportunities for criminal activities. Laundries, maternity homes, markets, etc., are considered part of the Crime Generation category due to their high concentration of women. Meanwhile, POIs that do not clearly fall into any of these classifications are categorized as Neutral.

Notably, due to the high density of POI, a buffer of 500-meter is excessive. Therefore, we established a 200-meter buffer zone, as this distance is commonly deemed sufficient to accommodate the daily requirements of the residents (Papadikis et al. 2024). POI data is obtained from Google API (https://mapsplatform.google.com/, accessed 2022/09/02), and the road network data is obtained from HDX (Humanitarian Data Exchange) platform (https://data.humdata.org/, accessed 2022/08/22), which is publicly accessible via Office for the Coordination of Humanitarian Affairs (OCHA).

Micro-level SE variables

SVI Collection

The GIS data of Mumbai’s roads in 2020 were extracted from HDX (Humanitarian Data Exchange). The average block size in Mumbai is ~300 M. Therefore, sampling SVI data at the 200 M interval ensures an adequate representation of the visual quality of each street segment. For each sample point, SVIs were downloaded in four directions (0°, 90°, 180°, and 270°) to comprehend the human eye-level visual quality (Tang et al. 2020) using Google Map API (Application Programming Interface). A 90° horizontal field of view (FOV) and 0° pitch were chosen to control the camera’s up/down angle, with an image resolution of 600×400 pixels (Qiu et al. 2022) for each SVI. After filtering invalid images, which are interior scenes or blank, in total, 55,244 valid SVIs were obtained based on these 13,822 sample points.

Semantic and Instance Segmentation

Streetscape features are defined by the pixel ratio of classified visual elements, which are crucial for urban environment auditing (Ito and Biljecki, 2021). The Pyramid Scene Parsing Network (PSPNet) can effectively pixelate common streetscape features with semantic segmentation (Qiu et al. 2022; Zhao et al. 2017). Based on the literature, this study selected to analyze 15 streetscape features linked to sexual violence crimes, of which their pixel ratios were calculated (Chen et al. 2020). Moreover, to accurately count the numbers of cars and pedestrians who appeared in SVIs, Mask Region-based Convolutional Neural Networks (Mask R-CNN) were utilized for instance segmentation. Additionally, the Harris Corner Detection algorithm quantifies the complexity of street facades at eye level (Harris and Stephens, 1988). This algorithm identifies and analyzes key points, or “corners,” in images, which often correspond to architectural details or structural features, providing insights into the design richness of urban environments. Figure 4 illustrates the results of semantic segmentation, object detection, and feature extraction from selected Google SVIs.

Fig. 4
figure 4

Samples of (1) origin SVI, (2) instance segmentation result from Mask RCNN, (3) semantic segmentation result from PSPNet, (4) Harris corner detection of buildings.

Sociodemographics

Sociodemographic variables were obtained from WorldPop (https://www.worldpop.org/, accessed 2022/08/23) and Open City (https://opencity.in/, accessed 2022/08/22), including population, poverty level, distance from slums, and women’s household status. Their associations with crime incidents have been verified (Fleissner and Heinzelmann, 1996). Specifically, the poverty level is represented by the spatial distribution of children aged 12 to 23 months born in low-income families. Women’s household status is represented by the proportion of women aged 15–49 who do not participate in household decision-making. Table 2 provides a summary of descriptive statistics for all variables.

Table 2 Descriptive statistics for all variables.

Model Architecture

Variable Selection

Multicollinearity refers to a strong linear correlation between two or more independent variables in a multiple regression model. This issue can complicate the interpretation of how each independent variable contributes to predicting or explaining the dependent variable, potentially leading to misleading conclusions. To address this, researchers calculate the Variance Inflation Factor (VIF), which measures how strongly an independent variable is correlated with other independent variables in the model (Park et al. 2018). A higher VIF value indicates stronger multicollinearity, with a value of 1 suggesting no collinearity. Typically, when a variable’s VIF exceeds 10, it is considered problematic. Therefore, this study excluded such variables to improve the model’s accuracy and reliability.

Spatial Effects

OLS, Moran’s I Test and Lagrange Multiplier Test

Linear regression models such as OLS regression are widely used to examine the correlation between behavioral activities and architectural and environmental features due to their excellent interpretability (Keralis et al. 2020). OLS regression estimates the relationship between independent variables and dependent variables by minimizing the sum of squared residuals, making it a foundational tool in statistical analysis. However, when working with geographical data, spatial dependence is often present due to spatial interaction and diffusion effects, implying that observations from nearby locations are not genuinely independent (Park et al. 2018). Ignoring spatial dependence can lead to biased estimates in regression analysis. To detect whether spatial autocorrelation exists, this study first evaluated Moran’s I, a widely used statistical measure that quantifies the degree of spatial dependence by comparing observed spatial patterns with those expected under random distribution. A significant Moran’s I value suggests that spatial dependence is present and must be accounted for in the analysis. To further identify the specific type of spatial dependence, this study performed a Lagrange Multiplier (LM) test. While Moran’s I only indicates whether spatial dependence exists, LM helps determine whether the dependence is due to a spatial lag (where the dependent variable in one location is influenced by the dependent variable in nearby locations) or a spatial error effect (where spatial correlated residuals suggest the presence of unobserved spatial factors). Based on the results of the LM test, we selected the appropriate model adjustments to improve the robustness of our analysis.

Spatiotemporal change and non-stationarity

Existing research suggests that due to the use and significance of the physical environment and the daily activities of people visiting the area, the physical environment has different impacts on crime patterns at different times of the day. Therefore, the crime opportunities and guardianship capabilities undergo temporary changes at different times (Kim and Wo, 2023). Typically, Ordinary Least Squares (OLS) assume spatial stationarity with static correlations between variables. However, these relationships can vary with spatial and temporal changes. Thus, our study incorporated spatiotemporal non-stationarity. The Geographically Weighted Regression (GWR) accounts for spatial variation in relationships by allowing regression coefficients to vary with location, addressing spatial inconsistency. Extending this, the Geographically and Temporally Weighted Regression (GTWR) further integrates temporal factors, enhancing the model’s capacity to handle spatiotemporal variability. Specifically, formulas (1), (2), and (3) denote the mathematical expression of OLS, GWR, and GTWR, respectively, whereas Y is the dependent variable. x are explanatory variables. β are regression coefficients, \(\varepsilon\) is the residual. \(({\mu }_{i},\,{\upsilon }_{i})\) is a matrix considering spatial distance, and \(({\mu }_{i},\,{\upsilon }_{{i}},\,{t}_{i})\) is a matrix based on space and time distance functions.

$${y}_{i}={\beta }_{0}+\mathop{\sum }\limits_{j=1}^{k}{\beta }_{j}{x}_{ij}+{\varepsilon }_{i}$$
(1)
$${y}_{i}={\beta }_{0}({u}_{i},{v}_{i})+\mathop{\sum }\limits_{k=1}^{n}{\beta }_{k}({u}_{i},{v}_{i}){x}_{ik}+{\varepsilon }_{i}$$
(2)
$${Y}_{i}={\beta }_{0}({u}_{i},{v}_{i},{t}_{i})+\mathop{\sum }\limits_{k=1}^{{\rm{m}}}{\beta }_{k}({u}_{i},{v}_{i},{t}_{i}){x}_{ik}+{\varepsilon }_{i}$$
(3)

Model selection

This study aimed to investigate how BE and SE factors impact sexual violence against women in Mumbai, considering spatiotemporal non-stationarity. To ensure the appropriate selection of regression models, we first conducted multi-collinearity and spatial autocorrelation tests. Moran’s I confirm the presence of mild spatial dependence, prompting further investigation using the robust Lagrange Multiplier (LM) test. The robust LM test detected minimal spatial interaction effects in the OLS, leading us to dismiss traditional econometric models like Spatial Lag Model (SLM) and Spatial Error Model (SEM), which are specifically designed to account for strong spatial dependencies. Instead, we adopted GWR and its extension, the GTWR, to address potential spatiotemporal variations.

This study explored the relationship between independent and dependent variables using three models: OLS, GWR, and GTWR. To evaluate the explanatory power of variables at different levels, we set up four different combinations: (1) socio-demographic (SD) variables alone, (2) SD and macro-level BE variables, (3) SD and micro-level SE variables, and (4) SD, SE, and BE variables together. Each combination was tested across all three models to evaluate the impact of different variable sets on model performance. Notably, the baseline model includes only socio-demographic variables, with macro-level BE and micro-level SE variables added incrementally to assess their comparative impacts. Model performance was evaluated using two metrics: R-squared (R²): This metric measures how well the independent variables explain the variation in the dependent variable. A higher R² value indicates that the model captures more of the variability in the data; Akaike Information Criterion (AIC): AIC assesses model fit while penalizing excessive model complexity. A lower AIC value suggests a better balance between explanatory power and simplicity, helping to avoid overfitting.

Results

Descriptive results

Figure 5 illustrates the hourly distribution of sexual violence incidents. The frequency of incidents is lowest at 4 am and highest at 7 pm. From 4 am to 4 pm, the frequency of sexual violence incidents exhibits a significant upward trend. The upward trend slows down from 4 pm to 7 pm. From 7 pm to 11 pm, there is a linear downward trend, becoming more gradual from 11 pm to 4 am the next day (Fig. 5).

Fig. 5
figure 5

Time distribution of incidents of sexual violence against women.

Figure 6 presents the spatial distribution of incident density, dividing the 24-h period into four equal segments. Overall, the distribution of crime volume is scattered throughout the whole day, which reflects their spatial heterogeneity. High-incidence areas are concentrated near the central urban area and the southern financial district for the whole day. However, the distribution of crime in the north-central and southeastern regions of Mumbai continued to change irregularly in the four time periods, which means that crime pattern has spatial-temporal heterogeneity, varying over time and space (Fig. 6).

Fig. 6
figure 6

Spatial distribution of sexual violence against women in different time periods.

Table 3 presents descriptive statistics and pairwise comparisons for crime rates across different time periods. The results reveal significant temporal and spatial variations in crime distribution. The mean crime rates are highest during the daytime (12–18 h, 0.940) and lowest during the nighttime (0–6 h, 0.118). Standard deviation (SD) and range values indicate more significant intra-period variability during the daytime (SD = 1.333, Range = 8.571) compared to the night (SD = 0.243, Range=1.833). The coefficient of variation (CV) provides further insights into relative variability. The night period has the highest CV (2.056), suggesting that, despite lower overall crime rates, certain units experience disproportionately higher rates than others. In contrast, the daytime periods (6–12 h and 12–18 h) show relatively lower CV values (1.570 and 1.418, respectively), indicating more consistent distributions of crime rates across spatial units.

Table 3 Descriptive Statistics and Pairwise Crime Rate Comparisons Across Time Periods.

Since the distribution of crime rates across units within each time period deviates from normality, this study employed Dunn’s Test (Dinno, 2015) instead of ANOVA (Gelman, 2005) to assess the significance of differences. Dunn’s Test is a robust non-parametric method that compares ranked data and identifies pairwise differences using p-values. The results confirm statistically significant differences in crime rates across almost all time periods (p < 0.05), highlighting substantial temporal crime rate variations, particularly between the nighttime and other periods.

Model results

OLS & GWR results

To avoid the issue of multicollinearity, the Variance Inflation Factor (VIF) test was conducted, and variables with VIF values exceeding 10 (i.e., Sky, Building, Fence, Wall, Plant, Car, and Truck) were excluded to ensure the model’s reliability (Table A2). As previously mentioned, four different variable combination strategies were computed for each model, with the case including only SD variables serving as the baseline model. For clarity and efficiency in presentation, we assigned numerical labels to these models, as shown in Table 4. Table 4 lists the performance of OLS and GWR models regarding adjusted R2. Overall, the GWR model performs better with higher R2, indicating that modeling the non-stationarity issue dramatically improves the model’s performance.

Table 4 Adjusted R2 of OLS and GWR models.

For OLS models, compared with the baseline, the R-squared values of Models 1.1 with SD and BE variables and 1.2 with SD and SE variables have increased in all four time periods of the day. The strength of BE is close to that of SE, indicating that both BE and SE have significant relevance to crime events. Model 1.3, which incorporates all variables, exhibits a more substantial increase in R² than Models 1.1 and 1.2, suggesting that the combined influence of SE and BE is greater than that of either variable group alone. For GWR models, compared to OLS models, the improvement in the average R2 in the BE model (0.505) is more significant than in SE (0.36), indicating that the GWR model captures more explanatory power of the macro-level BE.

Table A3 in the Appendix shows the results of the best OLS model (Model 1.3). All residuals are significant in Moran’s I test, with p-values of almost 0.000. This indicates that there is significant spatial autocorrelation in the OLS residuals, suggesting that ignoring spatial interactions may lead to biased estimates. Moreover, only models in two time periods (6–12:00 and 12–18:00) are significant in the LM (lag) test with the P values of 0.004 and 0.003. This indicates that the spatial lag effect is evident in these periods but not in other periods, suggesting spatial econometric models (e.g., spatial lag/error) might not be appropriate to consider all time periods. Therefore, GWR is a better option to capture spatial heterogeneity.

Appendix Table A4 reports the coefficient estimates for the best GWR model (Model 2.3), while Fig. 7 shows the number of spatial units in which each independent variable’s coefficient estimate is significant by four time periods. Notably, during 0–6:00, the number of effective spatial units was significantly less than in other time periods, indicating that the criminogenic mechanism during midnight is potentially different. There are important latent variables future studies need to further investigate. Moreover, using the lower quartile values as the bar (the red dashed line), the explanatory power of eight variables was relatively weaker (i.e., In-decision, Railings, Railway station (D), Laundries, Casinos, Nightclubs, Cybercafes, and Playgrounds).

Fig. 7
figure 7

The number of spatial units in which each independent variable’s coefficient estimate is significant (using P < 0.05 as the criteria).

Comparing the best-performing models of OLS (Model 1.3) and GWR (Model 2.3), after adding spatial non-stationarity, about 87.1% of the variable parameter estimates are relatively stable. The consideration of spatial non-stationarity further refines the explanatory power of variables in geographical space. For example, for the SE indicators, Trees, Railings, Street facade complexity, Persons, Buses, and Traffic lights are insignificant in OLS. However, in the GWR results in the 6–24 period, these variables all have more than 66.1% of the electoral districts with significant correlation. Figure 8 shows the average coefficient values of the best GWR model, with eight variables’ strengths being significantly smaller (i.e., Buses, Grass, Railway station (D), Nightclubs, Traffic lights, Playgrounds, Laundries, and Railings).

Fig. 8
figure 8

The average correlation coefficient for each variable in GWR model.

GTWR results and spatiotemporal non-stationarity

The Geographically and Temporally Weighted Regression (GTWR) captures spatiotemporal variations (Wang et al. 2014) and employs the Akaike Information Criterion (AIC) to assess model fit, preferring models with fewer parameters to avoid overfitting. If the AIC difference exceeds 3, the model with the smaller AIC value will improve significantly. Therefore, as shown in Table 5, among the four models, the AIC values of the GWR and GTWR models are significantly lower than OLS, proving that considering spatial and temporal non-stationarity can significantly improve the performance of the model.

Table 5 Comparison of the results of three models.

Notably, the AIC value in the GWR model decreases significantly when considering BE and SD variables and their combination, showing a more significant reduction than in the GTWR model. This shows that BE has stronger spatial non-stationarity compared to SE. The spatial non-stationarity of the Model 3 is stronger than spatiotemporal non-stationarity. Table A5 in the Appendix shows GTWR coefficients. Compared to those of GWR models (Table A4), 78.23% of the variables’ coefficient directions are consistent, further demonstrating the robustness of GWR models.

The effects on crime

Given that the GWR model effectively captures spatial non-stationarity and has the best fit (Model 3), further discussions are based on the results of GWR models. This study applied the GWR model to calculate the coefficients of each variable for every spatial unit across four time periods. However, given the 500-meter grid sampling, reporting coefficients for each unit in detail is meaningless. To provide actionable insights for urban planning, as shown in Table A4 in the Appendix, the study summarized the coefficients of all spatial units in Mumbai for each variable using statistical metrics: Minimum (MIN), Lower Quartile (LQ), Average (AVG), Upper Quartile (UQ), and Maximum (MAX). The AVG indicates the average influence of each variable on crime rates, while MIN and MAX highlight the range of variability, reflecting spatial heterogeneity. For instance, during 0:00–6:00, the Public Toilet coefficient ranges from −0.134 to 0.064, showing a negative correlation with crime rates in some areas and a positive correlation in others, underscoring spatial heterogeneity. Additionally, the variations in AVG values across the four periods indicate temporal heterogeneity, like Public Toilet shows both negative and positive correlations depending on the time.

Overall, 67.74% of the variables present a spatiotemporal binary structure, showing differentiation between suburbs and urban areas in space and between day and night temporally. The correlation of variables in suburban areas 4 and 5 around the main urban area is dynamic throughout the day. These areas have both characteristics of suburban and central urban BEs. 32.26% of the variable correlation coefficients are relatively stable throughout the day and across all areas. According to the analysis in Figs. 7 and 8, the variables nightclubs, laundry, railway station (D), playgrounds, and railings simultaneously satisfy smaller correlation coefficients and fewer correlated areas, so the study determined that they have no significant impact on sexual violence crimes. Next, we discussed the remaining variables in detail with different groups.

Roles of micro-level SE variables

Figure 9 illustrates the daily variation in Coefficient values for seven explanatory variables in the micro-level SE attribute groups. Street facade complexity is mostly negatively correlated in all regions throughout the entire period, with an average correlation coefficient of −0.72. This indicates that the complexity of street facades can significantly enhance space surveillance and reduce the likelihood of crime, which validates Newman’s natural surveillance theory (Newman, 1972). Streetlights, with the second highest absolute Coefficient value, show a positive correlation in 79.17% of regions but are negatively correlated 75% of the time in Suburban 3 from 6:00–24:00. This might stem from Suburban 3’s lower development and isolation by rivers, where streetlights may deter crime. However, areas with streetlights in other regions may imply higher-grade streets and busier traffic, providing more opportunities for crime.

Fig. 9
figure 9

Coefficient variations in the micro-level SE attribute group.

Some variables showed inverse correlations in Mumbai’s urban and suburban areas day and night. Trees and grass were consistently negatively correlated in the northern suburbs while displaying a positive correlation from 6:00 to 24:00 in the southern urban area. Street pedestrians similarly showed a positive correlation during these hours in the southern city and suburbs 4 and 5 but were negatively correlated elsewhere. This may be because streets, pedestrians, trees, and grass outside urban areas may enhance the community territoriality and management, potentially reducing sexual violence against women (Adachi and Nakaya, 2022). Buses exhibited a strong negative correlation with crime in the southern urban area during the same hours, suggesting public transport aids in monitoring, but a slight positive correlation in other areas indicated possible contributions to crime due to higher mobility and lesser capacity (Neiss, 2016). As the day progresses, the negative correlation gradually extends to suburbs 4 and 5. This interesting finding shows that suburbs around the main urban area have temporal and spatial fluctuations. During work hours (6:00–18:00), these areas are less populated and more suburban. From 18:00 to 24:00, office workers return to these lower-cost areas, increasing the population and giving the suburbs an urban character (Stults and Hasbrouck, 2015). Traffic lights in the southern metropolitan area and its surrounding suburbs 4 and 5 showed a negative correlation from 12:00 to 18:00, whereas northern suburbs had a positive correlation from 6:00 to 24:00, suggesting that traffic intersections in the suburbs deter sexual violence with a diminishing effect by night.

Roles of macro-level BE attributes

This section examines four variable groups in urban environments (Fig. 10), examining their effects on sexual violence, which vary by time and space. Effects differ between day and night and across zones Correlation coefficients for four crime-related sites remain stable. Neutral spaces and certain public areas without specific security functions do not exhibit clear positive or negative impacts on sexual violence crimes, which is consistent with our common cognition. Conversely, areas linked to security, high traffic, or high risk for women consistently demonstrate either positive or negative influences on sexual violence crimes, resulting in stable correlation signs over time and space.

Fig. 10
figure 10

Coefficient variations in the macro-level BE attribute group.

In crime defense sites, police stations’ coefficients are consistently non-negative across different periods. This counterintuitive finding suggests that they may not effectively deter sexual violence. Conversely, the distance to fire stations exhibits positive coefficient values for 87.50% of the day, particularly in the western coastal regions. However, a weak negative correlation occurs near eastern Thane Creek Flamingo, where limited fire station infrastructure reduces their effectiveness in surveillance and crime prevention. This result suggests that the findings of previous research (Fleissner and Heinzelmann, 1996), which claim that public security management facilities can prevent crime, should be further examined. In particular, it is important to consider the varying effects of different criminal motivations and categories of public management on crime.

Crime-neutral sites show dynamic spatiotemporal relationships with crime. Worship places, for example, typically reduce crime in city centers and suburbs during specific times (12:00 to 6:00 and 18:00 to 6:00) due to their unique spiritual attributes but may increase it during other times (6:00 to 18:00) due to loosening management issues. Distance to hospitals and schools shows variable effects depending on location and time, indicating their influence can vary significantly across the urban landscape. Parking lots and restaurants also predominantly show negative correlations with crime throughout the day, implying their roles in crime prevention through effective field monitoring. This finding further supports the CPT (Brantingham and Brantingham, 1995) on the definition of a crime-neutral location, whose impact on crime is uncertain.

CPT (Brantingham and Brantingham, 1995) defines crime generation places as places where crime naturally occurs due to high traffic. However, our results show crime generation places, except for transportation systems variables have binary tendencies, with some reducing and others increasing crime. General stores, with their visibility and monitoring, mostly deter crime, showing a negative correlation in 94.44% of areas during the day. In contrast, markets in suburban areas and the southern city center exhibit varying correlations, sometimes increasing crime due to their high pedestrian traffic. Proximity to maternity homes consistently correlates negatively, suggesting that they are more prone to crime due to the concentration of potential victims. Transportation systems like bus stops differently impact urban and suburban areas. In the city center, they deter crime, but in the suburbs, they potentially facilitate crime by increasing mobility. Similarly, high road density correlates with more crime, indicating that denser traffic areas may concentrate potential crime targets. This result proves the existence of spatiotemporal heterogeneity in crime generation locations. Therefore, it is difficult to generally rely on the classification of CPT theory to determine the impact of specific POIs on crime in the exploration of crime mechanisms.

In high-crime locations, suburban areas 4 and 5 exhibit a positive correlation between public toilets and sexual violence incidents throughout the day, suggesting they may facilitate crime there, contrasting with their deterrent effect in northern suburbs. Cybercafes strongly correlate with increased afternoon and evening crime risks, while casinos in the southern city center pose higher risks from noon to evening. These findings underscore the complex role of BE features in influencing crime patterns in Mumbai. Even when certain POIs appear to attract crime, their actual impact varies significantly across different time periods and spatial contexts.

Roles of socio-demographics

In socio-demographics (Fig. 11), the variable Wealth correlates positively with sexual violence against women in the main urban area. Conversely, the variable In-decision demonstrates a negative correlation in the main urban area and surrounding suburbs from 12:00 to 18:00, but with no specific pattern in suburban area 4.

Fig. 11
figure 11

Coefficient variations in the socio-demographic attribute group.

A graphical analysis (Fig. 12) reveals positive correlations near train stations and airports on the outskirts, especially from 18:00 to 24:00, while most other areas show negative correlations. This pattern suggests that women with lower decision-making power may be less vulnerable to sexual violence due to protective effects in these areas.

Fig. 12
figure 12

The number of regions that have a relevance in different variables.

The variable Slum (D) is negatively correlated with sexual violence in the main urban area from 6:00 to 24:00 but is positively correlated in other regions throughout the day. The lack of a clear negative correlation in northern suburbs, which is a counterintuitive result, may be due to low usage of a reporting app, leading to incomplete data. The variable Population shows weaker correlations in the main urban area from 0:00 to 6:00 but consistently positive correlations from 6:00 to 24:00. In the suburbs, most areas exhibit negative correlations all day, underscoring the robustness of the regression model in reflecting micro-level environmental impacts.

Discussion

The impact of BE and SE attributes on crime

The SE and BE variables robustly explain violence against women, corroborating findings from most physical activity studies (Jiang et al. 2022). Both BE and SE attributes significantly improved the adjusted R2 and AIC when spatial and temporal factors were considered, which highlights the necessity of considering spatiotemporal heterogeneity in crime research.

The consistency between micro-level street view variables and macro-level regression outcomes—such as the alignment between buses (SE group) and bus stops (BE group)—demonstrates the robustness of the regression model and underscores the importance of integrating both SE and BE attributes into crime analysis. In comparison, models incorporating macro-level BE attributes achieved higher explanatory power, indicating that these factors should be prioritized in urban planning and crime prevention strategies. However, micro-level SE variables provide crucial insights into preventing sexual violence against women in Mumbai. When ranking variables by their absolute correlation coefficients, the top five influential variables include three from the SE group (street facade complexity, streetlights, and pedestrian presence) and two from the BE group (road density and schools), with road density exhibiting the strongest correlation with increased sexual violence in urban areas.

In addition, this study revealed the complex mechanisms by which variables affect crime over time and space (Table A5). Some key variables, such as bus stops, show a positive correlation with sexual violence, supporting the CPT classification of crime generator. Conversely, general stores, traditionally categorized as crime generators in CPT due to high pedestrian volume, exhibit a negative correlation with sexual violence in Mumbai, which suggests that their surveillance function may mitigate crime risk. Similarly, the impact of casino (D) varies significantly across different time periods—it is positively correlated with crime during 0:00–6:00 but negatively correlated during all other periods. This temporal inconsistency challenges the conventional assumption (Brantingham and Brantingham, 1995) that casinos function purely as crime attractors, indicating that their role in crime patterns is more nuanced.

These findings reveal that previous research on how POIs influence crime may be generalized, often neglecting specific criminal motivations (e.g., targeting women) and spatiotemporal variations. Future studies should delve deeper into these contextual factors to refine crime prevention strategies and develop more targeted urban policies.

Implications for urban planning and design

This study provides insights for researchers, urban planners, and security authorities to improve city environments and reduce sexual violence against women. Design strategies include wider streets, controlled facades, and diverse small shops to boost neighborhood watchfulness. Managing tree heights and visibility in green spaces is also crucial to avoid creating secluded hideouts. For high-risk areas like casinos and public toilets, redesigns should include alarms and optimized window-wall ratios for better external monitoring. Additionally, Mumbai’s city managers must allocate resources wisely and implement detailed spatial-temporal security measures. Our research results also provide a scientific reference for the spatiotemporal deployment of security measures.

Limitations and future studies

This study has some limitations that can be improved in the future. First, the data were self-reported by app users, limiting population coverage and accuracy. Future research should employ more comprehensive data collection methods for sexual violence against women. Secondly, while this study employed the Geographically Weighted Regression (GWR) model to examine correlations between the BE and urban safety, it does not establish causal relationships and lacks certain latent variables that require further investigation. For example, graffiti and broken facilities might decrease urban safety because they indicate a chaotic neighborhood environment (Kelling et al. 2003). Public cameras and building windows are likely to curb crime in neighborhoods due to their surveillance properties (Piza et al. 2019). Additionally, while our study did not focus on enhancing prediction accuracy, other statistical techniques such as LASSO regression or machine learning models like random forests could be explored to improve predictive capabilities. Future studies should also consider broader contextual factors, such as culture, education (Buonanno, 2003), etc., to provide more actionable safety recommendations. Finally, due to the lack of nighttime SVI datasets, the micro-level SE variables are all extracted from daytime SVIs, which may overlook their temporal variations. Future work could generate nighttime SVIs via generative AI to capture the impact of nocturnal street scenes (especially luminance) on crime incidents at night (Liu et al. 2024; Ye et al. 2024).

Conclusion

Through an analysis of crowdsourced data on sexual harassment and abuse in public spaces, the study aimed to investigate the geographical distribution patterns of sexual violence crimes and thoroughly evaluate the underlying determinants of the BE and the spatial-temporal variability of crime volume. Utilizing socioeconomic variables as the foundational model, the study primarily examined the impacts of two sets of attributes - macro-level BE and micro-level SE variables on sexual violence crimes across four time periods in a day. Following multicollinearity and Moran’s I test, four GWR models were developed to capture the geospatial correlation of criminal behavior for each period. Additionally, temporal trends were identified by comparing correlation coefficients of the same variables across different time periods. OLS and a GTWR model were also constructed to account for spatiotemporal heterogeneity, and their performance was compared with the GWR model.

The results quantify the impact of environmental factors on sexual violence across different times and locations, aiding city planners and managers in developing strategies to mitigate such crimes. This research highlights the utility of multi-source data and AI tools in linking environmental factors with sexual violence, proposing a scalable approach for global trend analysis and safety strategy customization for different cities.