Introduction

Mechanisms of nephrotic syndrome remain to be characterized in pediatric patients with minimal change disease (MCD) and focal segmental glomerulosclerosis (FSGS). Therapeutic response in pediatric patients with FSGS is variable as compared to MCD where positive intervention is more likely and outcomes are more positive. Development of glomerular disease biomarkers able to reliably differentiate subgroups could be useful towards identifying molecular patterns in disease activity and potential response to therapy. Proteomics analyses of biofluids in these patient groups could lead to population level stratification tools to better understand treatment outcomes and sensitivity. Identification and characterization of molecular signatures of nephrotic syndrome and changes in patient grouping based on quantitative cluster analysis of the proteome promise more productive implementation of biomarkers in the clinic.

Characterization of the urine proteome has the potential to identify biomarkers of pediatric kidney diseases. Identifying minimally invasive sources of diagnosis and response to therapy could decrease the need for invasive renal biopsies and improve treatment regimens. High-throughput quantitative mass spectrometry approaches have been successful in identifying renal disease biomarkers such as PLA2R1, HTRA12 and THSD7A3 in membranous nephropathy. Translation of such high throughput methods to the study of large numbers of patient urine samples should enable urine biomarker identification to distinguish disease activity, offering diagnostic and prognostic markers of disease progression and therapeutic efficacy. In this context, the current studies were designed to test the hypothesis that characterization of urine proteomes in children with MCD and FSGS could enable identification of urinary biomarkers able to distinguish disease activity levels and identify molecular pathways regulating these differences. These studies utilized urine samples collected from children enrolled in the Cure Glomerulopathy (CureGN) Study, a large prospective multi-center observational study of ~ 2400 adults and children with biopsy-proven glomerular disease in whom detailed clinical data, as well as serial blood and urine biosamples, had been collected. Our current study utilizes quantitative proteomics approaches to characterize the urine proteomes in pediatric MCD and FSGS patients enrolled in the prospective CureGN cohort. We determined differences in patient populations with the same disease activity by performing clustering analysis on urine specimens to visualize differences in the nephrotic urine proteome, offering potential patient stratification and insights into disease heterogeneity.

Results

Patient characteristics and demographics

Urine proteomic analysis was used to cluster patients by unsupervised K-means clustering approaches as detailed below and in Fig. 1. Table 1 describes clinical characteristics and patient demographics in each cluster (CL1 and CL2) at the time of enrollment and biopsy collection. Included are age, sex, race, patient diagnosis, eGFR (estimated glomerular filtration rate), UPCR (urinary protein to creatinine ratio) and kidney failure rate at study follow-up. Table 2 shows clinical characteristics and patient demographics at the time of cluster assignment. UPCR was determined at University of Louisville (UofL) on each analyzed urine specimen at the time of proteomic sample preparation.

Fig. 1
Fig. 1
Full size image

Proteomic data processing and bioinformatic analysis. Flowchart illustrating proteomic data handling and processing. Each enclosed box indicates specific data acquisition and processing approaches used to determine patient stratification and clustering as a function of the urine proteome. Box 1) shows data acquisition by mass spectrometry and initial data handling post proteome search by MaxQuant. High stringency identifications below 1% false discovery rate (FDR < 1%) are the initial list cut parameter followed by removal of single peptide identifications, thus increasing stringency. Box 2) shows data filtering and missing value imputations calculated in Metaboanalyst from the comma separated value (CSV) converted files from the initial proteome identification lists and removal of any patients deemed as CKD5 to diminish data skewing. Box 3) shows the re-clustering approach from initial groups of patients based on UPCR > 2 that would normally be grouped together based on clinical data (UPCR, GFR etc.) K-means clustering was used to identify patients with UPCR > 2 and determine if sub-groups existed based on urine proteome differences. Box 4) candidate selection by supervised multivariate analysis of patient clusters to determine proteins with the most influence and drivers of group separation. Box 5) shows pathways and ontological network analyses to define changes in patient groups.

Table 1 Patient demographics and characteristics in cluster 1 and cluster 2 patients at biopsy and enrollment.
Table 2 Patient demographics, hypertension status, medication treatment, and kidney function at cluster assignment.

Urine proteome and clustering analysis

Quantitative differences in the pediatric urine proteome were measured by isobaric reporter ion tagging followed by 2D-LC–MS/MS. Overall, a total of over 1,100 proteins were identified with 2 peptides or greater at less than 2.3% false discovery rate (see data processing scheme in Fig. 1). Following data filtering, log transformation and median normalization to remove highly variable values and data imputation, 226 proteins from the filtered and normalized dataset were used for univariate and multivariate analyses in Metaboanalyst. Unsupervised k-means clustering (Fig. 2A) identified two patient groups within the nephrotic and sub-nephrotic patient groups (UPCR > 2) (CL1-NP n = 71, CL2-NP = 17). Multivariate analysis of the clusters (Fig. 2B) highlighted differences in the proteome contents between these patient groups, red is CL1 and green is CL2. Figure 2C shows VIP (variable of importance in projection) scores for proteins that markedly influence the partial least-squares discriminant analysis (PLSDA) separation of the clusters and the relative importance of each protein in the stratification each protein. (Supplemental Table 1 sheets contain TMT normalized protein list (table S1A), k-means cluster list (table S1B), processed data (table S1C) and VIP scored proteins list table S1D).

Fig. 2
Fig. 2
Full size image

Unsupervised clustering analyses of the nephrotic urine proteome. (A) K-means clustering of nephrotic/subnephrotic patient (UPCR > 2) urine proteome. Dark boundary line is the median intensity of each cluster, score plot shows the PCA (principle component analysis) of the normalized data. (B) Partial least squares discriminant analysis (PLSDA) plot shows separation of patients into two distinct clusters, CL1-NP (n = 71) and CL2-NP (n = 17). Inset histogram showing PLSDA model statistics: R2 = 0.95, Q2 = 0.73 for 5 components. (C) Plot of VIP (variable of importance in projection) for the most influential proteins in each patient cluster from PLSDA analyses.

Hierarchical heatmapping (Fig. 3A) illustrates the distribution of proteins for each cluster with the top 40 most enriched proteins shown in Fig. 3B. These analyses indicate potential biological differences in the proteome contents in patients otherwise categorized as nephrotic (UPCR > 3 mg/mg) or “subnephrotic” (UPCR 2–3 mg/mg).

Fig. 3
Fig. 3
Full size image

Hierarchical heatmapping of the urine proteome in cluster 1 and 2 patients from PLSDA analysis. (A) Overall heatmap of total proteome post-filtering and normalization shows robust clustering and enrichment of cluster 2 proteins. (B) Heatmap of the top 40 enriched proteins.

Tissue specific network analysis

Determining tissue origin of urine proteins in nephrotic syndrome could be helpful in understanding the mechanism of the disease. To this end, we utilized tissue specific network analysis of the VIP proteins for mapping gene/protein networks4. Kidney and podocyte proteins were mapped to modules for protein functional predictions, Fig. 4 illustrates the interactions between specific groups of proteins and connectivity with other modules in the renal tubules and podocyte tissue sources. The tables show functional characteristics associated with protein groups within each module and the statistical value (q-val) for ontology enrichment. Modules enriched in the renal tubules (Fig. 4A) have predicted functions including neutrophil aggregation, complement activation, and JAK-STAT signaling. When networks are mapped to the podocyte (Fig. 4B), the functional characteristics map to endocytosis, immune responses, cytokine secretion and complement activation.

Fig. 4
Fig. 4
Full size image

Enriched proteins (VIP > 0.9 in PLSDA) were analyzed for tissue network enrichment to determine biological processes and functions. (A) Glomerular networked proteins are separated into modules with lines between proteins and modules showing interconnectivity. Tables show functional characteristics mapped to each module and statistical evidence for ontology enrichment (q-val). (B) Podocyte networked proteins grouped into 6 modules, the table shows ontological enrichment for each module.

Protein interaction network analysis

To further investigate functional characteristics of the proteins eliciting group stratification, proteins with a VIP score greater than 0.9 were submitted for String-DB analysis to determine ontology and biological significance of the urine proteins in these two clusters (Fig. 5). Urine proteins in cluster 2 were elevated compared to cluster 1, enriched terms for this urine proteome included immune response, complement activation and leukocyte mediated immunity (Fig. 5B). Clustering analysis demonstrates the potential biological differences in the urine proteome contents of the two patient clusters, offering insights into patient stratification based on proteome profiles.

Fig. 5
Fig. 5
Full size image

STRING analysis of protein interactions for proteins from cluster 2 nephrotic patient urine with VIP > 0.9. Biological processes by gene ontology were sorted by false discovery rate, included are observed genes from the total and strength of association calculations for ontology and false-discovery rate.

Discussion

The current studies tested the hypothesis that comparison of urine proteomes in children with MCD and FSGS could enable identication of urinary biomarkers able to identify molecular pathways regulating differences in disease activity and to stratify patient populations based on differences in the urine proteome by unbiased cluster analysis. Quantitative proteomics was used to identify and quantify urine proteins in children with MCD and FSGS in the CureGN Study associated with disease activity. Unsupervised clustering differentiated the urine proteome into two groups of patients with UPCR > 2. The patients showed enrichment in groups of proteins involved in immune responses, complement and neutrophil infiltration.

Our data suggest a major role for complement cascade in cluster 2 patients. Complement is a component of the innate immune system that is activated to promote a cascade of protein cleavages that allow for attachment to glycosylated proteins and further activation of downstream complement proteins to form an immune attack complex and initiate inflammation5. Overactivation of complement has been implicated in many kidney pathologies including C3 Glomerulopathy, Dense Deposit Disease, IgA Nephropathy and Membranous Nephropathy where C3 accumulation occurs in the mesangium and glomerular basement membrane and can lead to damage to these structures6,7,8. Other studies have shown that complement components are increased in the urine of FSGS patients9,10.

Urine proteins enriched in nephrotic patients were cross-referenced with the plasma proteome from the protein abundance database (PaxDb) to determine relative contribution of kidney cell injury and plasma protein leakage11. Proteins we identified in urine such as DNAAF1 , HMCN1, SLURP1 and RNASE2 are less prominent or absent in plasma and serum proteome studies and would suggest kidney cell injury is contributing to the large protein abundance changes in our dataset instead of only resulting from plasma protein leakage across the damaged glomerular basement membrane (see Supplemental Table 2). Additionally, comparison of the urine proteome dataset with single nuclear transcripts of the kidney indicated that ~ 43% of the identified urine proteins have cognate mRNA in kidney cells (see Supplemental Fig. 1)12. This suggests that a significant proportion of the urine proteome in our dataset could be derived from kidney cell sources.

Proteomic profiling of urine has been used to propose candidate biomarkers of chronic kidney diseases over the past few decades13,14. Early determination of disease onset may be achieved with a better understanding and characterization of the disease urinary proteome. Improvements in mass spectrometry and quantitative proteomics methods are beginning to provide tools for better interogation and analysis of biofluids towards the goal of identifying useful biomarkers15. The lack of biomarkers for pediatric kidney diseases presents the opportunity to apply cutting-edge proteomics methods and approaches towards bridging the gap in this knowledge on well-defined urine specimens. Recent studies have shown that circulating anti-nephrin antibodies are commonly elevated in pediatric MCD patients and in idiopathic nephrotic syndrome16. Our study did not account for anti-nephrin in the patient cohort and is a limitation of the findings that need to be considered in future proteomic analyses of nephrotic syndrome pediatric patients.

In our current work, we investigated the urine proteome of MCD and FSGS in nephrotic syndrome patients with proteinuria. Urine samples were obtained from the CureGN biorepository from a pediatric cohort of patients under age 18 at time of biopsy, urine was collected at initial and follow-up visits for analyses. We performed multivariate analysis of urine proteome quantitation to determine differences in patient populations. Changes in the urine proteome in response to disease progression and remission would be useful towards determining biomarkers of therapeutic efficacy or resistance. The urine proteome is accessible, noninvasive, and can be repeatedly probed to determine changes in specific proteins.

Cluster analysis has been useful to distinguish within groups of patients based on biomarker characteristics17,18. Clustering of the nephrotic/subnephrotic urine proteomes indicated the potential for stratifying patient groups based on differences in the quantitative profile of the urine proteome. Although we did not find correlation with clinical features such as eGFR, slope of eGFR or time to remission (see Supplemental Fig. 2), we do see major differences in the urine proteome within the nephrotic/subnephrotic groups when we used unsupervised k-means clustering to differentiate patient groups with the same clinical diagnosis. This suggests the potential for identifying meaningful differences in patient urine proteomes that could lead to a stratification approach for disease severity and potential for relapse.

Methods

Study approval, ethics statement, patients and urine collections

All research protocols and consent documents were approved by the Institutional Review Board of Nationwide Children’s Hospital as the coordinating center (approval numbers IRB07-00400, IRB12-00039 and IRB05-00544), as well as by each of the participating centers. See Tables 1 and 2 for patient demographics and Supplementary Methods for clinical data for patients in the proteomics clusters. All research was conducted in adherence with the Declaration of Helsinki. All enrolled participants provided either informed consent or assent obtained from a parent or legal guardian, as appropriate.

Pediatric patient selection criteria

CureGN (Cure Glomerulonephropathy Consortium) is a multi-center prospective observational cohort study of children and adults with biopsy-proven glomerular disease19. Eligible patients have a diagnostic kidney biopsy within 5 years of enrollment with MCD, FSGS, Membranous Nephropathy (MN) or IgA nephropathy. Patients are excluded if they have end stage kidney disease at the time of enrollment or any of the following before kidney biopsy: solid organ or bone marrow transplant, active HIV infection, hepatitis B or C infection, diabetes mellitus, lupus or active malignancy. Participants are followed prospectively 2–3 times per year with collection of demographics, comorbid health conditions, medications, local laboratory measurements and biospecimens at least yearly. For this study, CureGN participants were included if they were < 18 years at the time of biopsy and had one visit at the time of disease activity as determined by locally reported labs and a minimum of one additional follow-up visit. Bioposy confirmed cases of MCD have poorer outcomes and could be a interpreted as biased towards analysis of patients with worse proteinuria. Active disease based on locally reported labs was defined for MCD, FSGS as UPCR > 3 mg/mg. Informatics analyses on the urine proteome indicated that many patients in the sub 3 UPCR range had proteome signatures indicating nephrotic syndrome flare, leading us to further analyze these patient subgroups as “subnephrotic” and potentially distinguishable when considering the urine proteome content and quantification.

Urine sample collection and determination of creatinine and protein values

Spot urine samples (preserved with a protease inhibitor) were collected from 216 participants at two study visits, initial active disease visit and a follow-up study visit. For patients without a subsequent 12 months visit, the closest visit to 12 months was selected. For the proteomics analysis we included only the MCD and FSGS specimens, though the raw data includes IgAN and MN participant data. Urine creatinine and protein are reported as mg/dL. Urine creatinine was measured by the Jaffe rate method in the Synchron DXC system by reacting urine creatinine with picric acid and measuring the production of creatinine-picrate absorbance at 520 nm. Urine protein was measured in a separate assay by reacting protein with pyrogallol red and molybdate and determining absorbance at 600 nm for PYR-Mb-Protein complex. Both tests were performed using FDA approved calibrators, controls, reagents and instrumentation in a CLIA-certified hospital laboratory. Ratio of urine protein to urine creatinine (UPCR) was used to determine disease activity with > 3.0 mg/mg defined as nephrotic and 2–3.0 mg/mg defined as subnephrotic.

Protein concentration for proteomic analyses

Protein was concentrated in 5000 MWCO centrifugation filters (Sartorius). Urine was concentrated to ~ 200 μL and buffered in 50 mM HEPES supplemented with 1X HALT™ (Thermo) protease and phosphatase inhibitors, and 0.1 M EDTA. Samples were brought from 1 volume to 3 mL with deionized HPLC grade water. Final protein concentration was estimated by Bradford assay in triplicate. Equal amounts of urinary protein were then processed by S-trap proteolysis and TMT labeling as described in the supplemental methods and reference20. Mass spectrometry analysis is described in Supplemental Methods.

Bioinformatics analysis

Determining different cluster groups within NP patients was applied to distinguish patient groups based on the urine proteome. K-means clustering was used to determine if NP groups with clinical phenotype overlap could be shown to cluster as a function of the urine proteome. Metaboanalyst 5.0 was used to perform k-means clustering analyses. NP groups were clustered into two clusters and then regrouped based on proteome profile similarity for multivariate analyses. These groups were then analyzed by PLSDA (partial least squares discriminant analysis) to find the proteins most important in discriminating clusters. Proteins with a VIP score > 0.9 were then analyzed by String-DB to determine functional characteristics of the clusters and groups based on the PLSDA. Tissue network analysis was performed using the Human Base tool (https://hb.flatironinstitute.org/module) to indicate contributions of specific network modules of proteins within the renal tubules and podocyte4.