Global distribution of spoligotype families
The dataset consisted of 28,436 M..tuberculosis Segregated using WGS, drug susceptibility testing, and geographic source data, the phylogeny is estimated using TB-Profiler software (Table 1). Spoligotypes were predicted using the new Spolpred2 software developed as part of this study (see Materials and Methods) (Table 1). Most isolates were from major global lineages (L4 50.3%, L2 25.9%, L3 11.2%, L1 10.1%) and the major spoligotypic family identified was Beijing (L2; 30 spoligotype; 25.3%), T (L4; spoligotype of 304). 18.4%), LAM (L4; 187 spoligotypes; 12.9%), Central Asian strains (L3; CAS; 125 spoligotypes; 9.3%), EAI (L1; 207 spoligotypes; 8.2%), but many samples were not Designated families (n = 3,318, 11.7%) (family naming was discontinued after the WGS-based phylogenetic system was developed). There were a total of 100 unique (sub)phyla and 2,991 unique spoligotypes. Isolates represent a convenience sample, but among isolates with assigned geographic sources (n = 26,209, 92.2%), Europe (36 countries, 39.6%), Africa (30 countries) , 39.6%). 21.9%), Western Pacific (8 countries, 14.1%), Americas (14 countries, 11.2%). However, there were also a small number of isolates with unreported national sources (n = 2,227, 7.9%).
To improve the rigor of the analysis, all spoligotypes with <5 isolates were removed, resulting in 24,661 (86.7%) isolates, 96 (96.0%) unique lineages, and 415 (13.9%) unique lineages. %) of different spoligotypes were obtained (Table 1, Table 2, Fig. 1). This filtering task revealed a large number of rare spogotypes (n = 3,775, see S1 table for list) across most lineages (L4 57.5%, L1 19.2%, L3 14.6%, others 8.7%). After filtering (n = 24,661), the most frequent spogotypic family was Beijing (7,167; 29.1%), followed by T (4,829; 19.6%) and LAM (3,434; 13.9%), consistent with pre-filtering However, unknown families decreased (n = 1,002; 4.1%) (Figure 1; Table 1; S2 Table). The most common WHO geographic regions were Europe (n = 8,602, 38.2%), Africa (n = 5,579, 24.6%) and Western Pacific (n = 3,590, 15.8%) (Table 1, Figure 1). This is also consistent with previous results. - Filtered data. Although many isolates occur in expected geographic regions, such as the Western Pacific and Beijing strains in Southeast Asia, there is great variability in the reported sources of infection, reflecting the prevalence of the virus. Mountain bike It has been since the spoligotypic label was devised and also because of the convenience of sampling with emphasis on transmission studies and clinically relevant investigations.
Spoligotype families and lineages
Among the 24,661 isolates, there was strong concordance between spoligotypic families and main lineages (Table 2; Table S3; Figure S1). At the major phylogenetic levels (L1 to L7), there were 408 (98.3%) spoligotypes that appeared only in their respective phylogenies. For example, AFRI families only appear in isolates classified as L5 and L6. EAI, CAS, and Ethiopian families are only within L1, L3, and L7, respectively. Similarly, the Cameroonian, H, LAM, S, T, Turkish and URAL spoligotypic families appeared only in L4, consistent with being the most genetically diverse lineage (Fig. 2). However, there were some discrepancies (20/7167; < 0.4%), such as a small fraction of the isolates with the Beijing sporomorphic family were classified as L1 (n = 1) or L3 (n = 19). (S3 table). These discrepancies cannot be explained by the low coverage of direct repeat regions. Isolates with the Manu spoligotype family were present in L2 (n = 38; 39.2%; Manu progenitor) and L3 (n = 59; 60.8%; Manu3). Most spoligotypes turned out to be exclusive (sub-)phyletics, and in many cases they represented only a relatively small proportion of the total sample of that lineage (S3 Table). For example, the spoligotype EAI2-Nonthaburi is found only in L1, but it appears in only 5.8% of all samples of that lineage and is known to be localized in Thailand.14. EAI2-Nonthaburi was originally found in the Philippines and resembles the spoligotype of EAI-Manila, the dominant strain in that country. 15. Conversely, as shown above, there is a Beijing-like spoligotype, highly prevalent in L2 (98.8%), while the other he also appears in two lineages (S2 table).
Subsequent analyzes examined secondary lineages (such as L4.2), tertiary lineages (such as L4.2.2), quaternary lineages (such as L126.96.36.199), and spoligotypes within subsequent lineages. At a more detailed level of subphylegy, there was a decrease in numbers in full agreement with spogotypes (2nd level: n = 300, 72.3%; 3rd level: n = 288, 69.4%; 4th level: n = 271, 65.3%). (Figure S2). For comparison, we repeated the analysis using the proposed set of 68 spacers. 16. Across the resulting 978 unique spoligotypes identified, the number assigned to a single (sub)strain was greater than when 43 spacers were used (first level = 99.5%, second level = 88.0%, 3rd level = 86.3%, 4th level = 84.2%) (Figure S2). Finally, a high proportion of isolates (n > 20) providing a high degree of discrimination at the fine-scale subphyletic level (43 spacer) spoligotypes, including EAI2-Manila and EAI2-Nonthaburi (L188.8.131.52) had. Ancestral (L2.1), T4-CEU1 (L4.1.2), Turkey (L184.108.40.206), LAM1 and LAM2 (L220.127.116.11), T2-Uganda (L18.104.22.168) (S4 table) ). These spoligotypes can be used to update phylogenetic SNP barcodes.