Gut microbiome strain-sharing within isolated village social networks

Abstract

When humans assemble into face-to-face social networks, they create an extended social environment that permits exposure to the microbiome of others, thereby shaping the composition and diversity of the microbiome at individual and population levels^1,2,3,4,5,6. Here we use comprehensive social network mapping and detailed microbiome sequencing data in 1,787 adults within 18 isolated villages in Honduras⁷ to investigate the relationship between network structure and gut microbiome composition. Using both species-level and strain-level data, we show that microbial sharing occurs between many relationship types, notably including non-familial and non-household connections. Furthermore, strain-sharing extends to second-degree social connections, suggesting the relevance of a person’s broader network. We also observe that socially central people are more microbially similar to the overall village than socially peripheral people. Among 301 people whose microbiome was re-measured 2 years later, we observe greater convergence in strain-sharing in connected versus otherwise similar unconnected co-villagers. Clusters of species and strains occur within clusters of people in village social networks, meaning that social networks provide the social niches within which microbiome biology and phenotypic impact are manifested.

Main

The microbiome is known to play a role in many human phenotypes⁸. In turn, diet, medications, lifestyle and environmental exposures affect microbiome composition^5,9,10. As few bacterial components of the microbiome survive for very long outside the human body, most must somehow be acquired from other humans through physical contact. Although maternal transmission is one obvious pathway^6,11,12, adults may acquire microbial species from other people beyond their mothers via social interactions¹. Indeed, in models involving both mice and primates, gut microbiome information can predict a host’s social interactions^{2,13,14,15,16,17,18}. In humans, recent evidence indicates the salience of household and spousal transmission^1,3,4. Yet, a substantially broader set of social relationships that people have—including in particular to unrelated people residing outside a person’s household—and the details of those social interactions (for example, their duration or frequency), are also likely relevant to a person’s microbiome composition.

Study cohort and network mapping

We studied 1,787 adults in 18 isolated villages in Honduras who are part of a larger population-based cohort⁷. This is a traditional setting involving face-to-face interactions within a circumscribed population that partakes of a traditional diet and is relatively devoid of antibiotics and other medications. The average distance from each of the 18 villages to the nearest other village among the 18 is 1.1 km, and the average distance to the farthest other village is 24.7 km. The populations of these 18 villages range in size from 66 to 432 people, and their underlying average household size is 3.49. The average age of participants is 41 years (s.d. = 17; range, 15–93); 62% are women and 41.8% are married.

We sociocentrically mapped face-to-face social networks for whole villages at two time points, collected a comprehensive set of individual and community-level characteristics, and obtained detailed gut microbiome sequencing data. The percentage of people in the village-level social networks for whom microbiome samples were collected ranged from 43% to 76% (Supplementary Table 2). We collected microbiome data for all 18 villages in 2020 and again for 4 of these villages (n = 301 people) roughly 2 years later. Both social network data^19,20 and microbiome data²¹ from such developing-world settings are scarce.

To map the social relationships within each village, we asked questions such as “With whom do you spend free time?” and “Who do you trust to talk about something personal or private?” (Supplementary Table 1). The total number of relationships identified within our cohort were: partner/spouse (410), father (303), mother (594), sibling (1,059), child (427), close friends (1,627), spend free time (1,749), and personal or private conversation (1,902). Some of these relationships overlap, and, after network symmetrization, we identified 4,658 unique social network links. For people who report spending free time together, we also collected details such as how often they did so, whether they shared meals, and how they greeted each other. The networks were mapped roughly in 2019.

Microbiome profiling

Microbial species can have materially divergent strains²² and genetically distinctive strain-sharing between two people can offer suggestive evidence that the shared strain resulted from interpersonal transmission rather than common exposure to an environmental factor such as diet (for example, fermented foods)^{4,22,23,24,25}.

We performed strain-level profiling with StrainPhlAn4 and detected putative transmission events between pairs of people²⁶. We summarized the strain-level similarity between two people with a strain-sharing rate metric that is equal to the number of shared strains divided by the number of species with available strain profiles that are present in any two samples²⁷. Overall, our data included information on 2,543 species and 339,137 strains (from the 841 species profiled by StrainPhlAn). We summarized the species-level beta diversity using the Bray–Curtis dissimilarity and the Jaccard index calculated on relative abundances.

Dimensionality reduction of the species-level relative abundances reveals differences in composition for most two-village comparisons and across all the villages combined (Extended Data Fig. 1).

Strain-sharing across relationship types

Pairs of people with diverse sorts of relationships (spouse, father, mother, sibling, child, close friend, free time, personal or private conversations) share significantly more microbial species and strains with each other than other pairs of people from within the same village with no relationship, and we observe a gradient of strain-sharing among relationships (two-sided Wilcoxon rank-sum tests, maximum adjusted P value (max P_adj) ≤ 0.05) (Fig. 1a). We find that the presence of a relationship tie, no matter whether to family or friend, increases the likelihood of strain-sharing (linear mixed-effects regression, all relationships β = 2.912; P < 2 × 10⁻¹⁶, and non-kin relationships β = 3.134; P < 2 × 10⁻¹⁶). Using a covariate permutation approach, it is apparent that the presence of a tie between two people has a larger association with strain-sharing than the similarity between the two people with respect to other factors such as diet, medications or socio-demographic attributes (Fig. 1e and Supplementary Data 1).

**Fig. 1: Strain-sharing across multiple relationship types.**

Spouses and same-household relationships have the highest strain-sharing (median strain-sharing rate of 13.9% and 13.8%, respectively). While previous studies have documented potential household and familial transmission^1,3,4, we also observe an elevated strain-sharing rate between non-kin relationships living in different households (median 7.8%, permutation P < 2.2 × 10⁻¹⁶). We observe less strain-sharing between people living in the same village who lack a social relationship (median 4.0%); this background rate might result from shared village environments or network-wide circulation of strains. We observe an even lower strain-sharing rate between people living in altogether different villages (median 2.0%).

Since species distributions are to some extent village-dependent (Extended Data Fig. 1), and pairs of people in the same village have a higher strain-sharing rate than pairs in different villages (Fig. 1a), village-level sharing can serve as a baseline for comparison. To account for both the potential influence of village-wide microbiome niches and of village-level network structure, we compared each relationship distribution to 100 samples from a within-village relationship permutation (for example, swapping mother–child pairs in the same village; Methods) and observed the same pattern of variation in strain-sharing by relationship type (Supplementary Fig. 1). This result is also observed at the species level (Extended Data Fig. 2 and Supplementary Fig. 2), although to a lesser extent, possibly suggesting that strain-sharing is more likely to be a result of direct transmission than species-level sharing, which could potentially originate from, say, a shared environment.

For people who report spending free time together, we examined how strain-sharing may relate to how often they spend free time together, how often they share meals and how they typically greet each other (Fig. 1b–d). The frequency that a person spends time with someone, whether in general or through a meal, is associated with an increase in strain-sharing (free time, Kruskal–Wallis test, χ² = 105.45, n = 1,703; P < 2.2 × 10⁻¹⁶; meals, Kruskal–Wallis test, χ² = 194.25, n = 1,737; P < 2.2 × 10⁻¹⁶). This result holds even when excluding the effect of kinship and living in the same house (free time, Kruskal–Wallis test, χ² = 12.96, n = 620; P = 1.53 × 10⁻³; meals, Kruskal–Wallis test, χ² = 10.6, n = 641; P = 0.014) (Supplementary Fig. 2), suggesting that close physical proximity and shared meals are potential transmission routes when people are not cohabiting. To be clear, shared meals can lead to similar gut microbiomes because eating similar foods at the same time can lead to microbial sorting in the gut, creating similar microbial communities even if there is no direct exchange of microbes between people²⁸. In certain analyses below, we accordingly adjust for diet, medications, water source and so on.

Pairs of people who greet each other with a kiss on the cheek have the highest median strain-sharing rate (median 12.9%)—although, perhaps due to the low sample size and diversity of greeting types, the strain-sharing rates across most greeting types are not significantly different (Fig. 1d and Supplementary Fig. 3). The strain-sharing rate for the subsample of non-kin living in different households who spend free time together almost every day (median of 7.1%) is higher than the strain-sharing rate for such people who see each other only once a week (6.0%) or a few times a month (4.8%) (Extended Data Fig. 3). A similar gradient is observed with the frequency that non-cohabiting non-kin have meals together, with those having meals daily or weekly (median strain-sharing rate 6.9%) sharing more than those who have a meal together a few times or only once a month (6.3% and 5.9%). Finally, when the reciprocity of the relationship is considered (that is, both people need to nominate each other for the tie to be deemed present), we observed an increased strain-sharing rate in all relationship types (except for partner) (Extended Data Fig. 4).

We find that mothers have a significantly higher strain-sharing rate with their children than fathers (two-sided Wilcoxon rank-sum test, P_adj ≤ 0.05) (Supplementary Fig. 4). Mothers may transmit bacterial strains to children during childbirth²⁹, and this higher strain-sharing rate may be a result of the retention of strains transmitted during infancy (indeed, the younger the child is, the higher the strain-sharing rate between mothers and their children; Supplementary Fig. 4). The higher mother–child strain-sharing rate may also relate to cultural practices that result in more opportunities for household transmission between mothers and their (adolescent or adult) children.

In contrast to previous analyses¹, we find no evidence that women are more likely to share strains with their direct social connections than men (two-sided Wilcoxon rank-sum test, P_adj ≥ 0.05) (Supplementary Fig. 5). In fact, at the species level, we observe the opposite, whereby men are more microbially similar to their connections than women, based on Bray–Curtis dissimilarity (two-sided Wilcoxon rank-sum test, P_adj ≤ 0.05; Supplementary Fig. 5). A large portion of this seems to stem from brothers having more similar microbiomes to each other than sisters (median Bray–Curtis dissimilarity 0.615 and 0.696, respectively; two-sided Wilcoxon rank-sum test, P_adj ≤ 0.05; Supplementary Fig. 5). However, this does not appear with the Jaccard index, suggesting that the absolute difference in species between brothers and sisters is not large, but that sisters are more variable in their relative abundances than brothers. The contrast with previous work may relate to different social habits in Honduras (for example, compared with Fiji³⁰) or to differences between the oral and gut microbiome.

Strain-sharing predicts relationships

To evaluate the strength of strain- and species-sharing across relationship types, we implemented a mixed-effects logistic regression model with cross-validation to predict whether any pair of people in a village has a social or familial tie. If there is a strong relationship between the social network and the microbiome network, we would expect the microbiome similarity between two people to be a strong predictor of a social tie. We also specified a second model that removed kin and household connections from our positive class. To account for potential confounding by socio-demographic factors, we created four versions of each model: with the strain-sharing rate as the only predictor (in addition to a random slope for each village); with only all the socio-demographic variables (that is, residing in the same household, age, sex, wealth, education, religion and indigenous status); with only strain-sharing rate and age and sex; and with strain-sharing rate and all the socio-demographic variables (Methods).

Using strain-sharing rate as the only predictor, the classifier achieves moderately strong performance across all relationships and also in non-kin, different-household relationships (area under receiver operating characteristic (ROC) curves (AUC) 0.71 ± 0.006 and AUC 0.67 ± 0.007, respectively) (Fig. 2b,e); Fig. 2a,d shows respective model predictions as applied to an illustrative village. Prediction performance is boosted when adding socio-demographic covariates, reaching AUC 0.83 ± 0.005 and AUC 0.78 ± 0.006 when predicting familial and non-kin relationships, respectively. Species-level similarity, as measured by Jaccard index or Bray–Curtis dissimilarity, achieves poor performance (all relationships: Jaccard, AUC 0.54 ± 0.008, Bray–Curtis, AUC 0.52 ± 0.008; Supplementary Fig. 6).

**Fig. 2: Strain-level models predicting social connections.**

We also performed two sensitivity analyses, involving stable ties and reciprocated ties. Using additional network data collected roughly 2.5 years earlier (in 2016), we selected a subset of ties that were classified as ‘stable’ if the participant previously reported the same connection. The stable tie classifier achieves similar performance when compared with the model run only on the second time-point social network on both all familial and social relationships (Fig. 2b,c) and on non-kin, different-household relationships (Fig. 2e,f). We also observed similar results when predicting relationship presence in the subset of strictly reciprocated ties (Supplementary Fig. 7 and Supplementary Table 3).

To understand how much more strongly strain-sharing indicates a social relationship compared with socio-demographic attributes, we again use permutation feature importance metrics (Methods)³¹, and we find that the strain-sharing rate is a stronger predictor of a relationship than similarity along any socio-demographic dimension (Extended Data Fig. 5).

Longitudinal analysis of strain-sharing

A subset of 301 people living in four villages were re-contacted after 2 years (roughly in 2021) and asked to provide a second stool sample. We first examined the fraction of strains retained over time by calculating the strain-sharing rate between pairs of samples provided by the same person; we observed a median value of 0.26 (interquartile range (IQR) 0.04–0.48) (a retention rate lower than other cohorts³²).

Then, by using the social network obtained at the outset (in 2019), we modelled the strain-sharing rate between pairs of (connected and unconnected) people in the same village at follow-up (Fig. 3; Methods). That is, we assessed how the existence of a tie between a pair of people, compared with the non-existence of a tie, was associated with any change in strain-sharing, comparing pairs of connected people with otherwise similar pairs of unconnected people in the same village 2 years later.

**Fig. 3: Microbiome strain-sharing across two time points.**

We observed that connected people have a higher strain-sharing rate at the subsequent time point than unconnected people (Fig. 3b). This was the case even after accounting for the socio-demographic (and dietary, medication and so on) similarity of the two people, their baseline level of strain-sharing and their village co-residence (Fig. 3b and Supplementary Data 2); that is, the coefficient associated with the existence of a previous relationship was positive (linear mixed-effects regression β = 0.25103; P = 8.28 × 10⁻¹¹). Moreover, as expected, the coefficient associated with strain-sharing between pairs of people at the first time point was also positive (linear mixed-effects regression β = 0.1033; P < 2 × 10⁻¹⁶), and the coefficient for the socio-demographic dissimilarity of the two people at baseline, as measured by the Mahalanobis distance, was negative (linear mixed-effects regression β = −0.0318; P = 6.4 × 10⁻⁴) (see Supplementary Data 2 for more analyses). We obtained similar results when modelling sharing status for each individual species in pairs of people (across all species combined) or when using a model with separate socio-demographic variables (Supplementary Data 2; Methods).

Network position and strain-sharing

The observed strain-sharing patterns within villages may reflect potential chains of transmission. For example, if a person’s microbiome is more similar to that of their friends than expected under the assumption that microbiome distribution and social network structure are independent, is this similarity also present between friends of friends? To explore this, we can calculate the distribution of strain-sharing rates based on the shortest geodesic distance between two people. Under the null hypothesis that a person’s social network has no marginal relationship with their microbiome composition, we create a permutation-based null distribution by randomly reassigning microbiomes across people in the village, and then comparing the resulting strain-sharing rates by geodesic distance. First-degree relationships have a much higher strain-sharing rate than we would expect under the null hypothesis (median 7.95%). This effect also extends to second-degree connections (5.10%) before falling off at a social horizon of third-degree connections (4.35%), where pairs of people have a median strain-sharing rate no higher than would be expected under the null hypothesis (Fig. 4a) (see Supplementary Fig. 8 for species-level analyses).

**Fig. 4: Strain-sharing and social network position.**

The strain-sharing patterns we observe allow us to view microbiome strain-sharing from the framework of ecology. People who are more socially central in the network may also be more microbially central and more exposed to strains potentially spreading within a network. That is, we might expect that central people are more microbially related to the rest of the village and more representative of the social microbiome (that is, the microbial metacommunity of transmittable strains within the village). After controlling for covariates, we tested whether there was a relationship between a person’s microbiome centrality, measured by their average strain-sharing rate with others in the village, and their social network centrality (that is, degree centrality, normalized betweenness centrality, or eigenvector centrality).

All three measures of social network centrality were correlated positively with a person’s average strain-sharing rate to the rest of the village, indicating that the microbiome of more socially central people is more representative of the microbiome in the village (linear mixed-effects regression; degree, β = 0.046; P = 3.14 × 10⁻¹⁰; normalized betweenness, β = 6.27; P = 1.21 × 10⁻⁴; eigenvector, β = 1.27; P = 1.67 × 10⁻¹⁰) (Fig. 4b and Supplementary Data 3). This effect is apparent visually in Fig. 4c, where participants are coloured based on their average strain-sharing rate with the rest of the village; more socially central people tend to have higher strain-sharing rates with everyone else than socially peripheral people.

We may also suggest that, whereas socially central people are more microbially representative of the overall network, they may be less microbially similar to their own first-degree social connections. A very popular person may be more representative of the social group at large, but, as a result of their many social interactions, they may be more removed from each of their individual connections, in a paradox of popularity. Indeed, we observe that increases in all three social network centrality measures correlate with a decrease in average microbiome similarity to first-degree connections (linear mixed-effects regression; degree, β = −0.21; P = 1.92 × 10⁻¹¹; normalized betweenness, β = −20.97; P = 8.79 × 10⁻⁴; eigenvector, β = −2.40; P = 5.12 × 10⁻³) (Fig. 4d and Supplementary Data 3). Gregarious people are less intimately related microbially to their social connections. This is apparent visually in Fig. 4e where participants are coloured based on their average strain-sharing rate with their first-degree connections.

Social clusters and microbiome clusters

The observed strain-sharing patterns along village, household, familial and social lines would mean that social clusters (that is, ‘communities’ of more densely interconnected people) should also have shared sets of particular microbiome species and strains. That is, the phenomena so far documented should come to instantiate or to reflect niches of microbiomes within niches of people (somewhat similar to soil biology³³).

At the smallest scale, people with a higher clustering coefficient (that is, transitivity) are more likely to have a higher average strain-sharing rate to those connections (linear mixed-effects regression, β = 3.24; P = 7.32 × 10⁻⁷). Having relationships with people who are also connected to each other may promote microbiome circulation, leading to the formation of microbiome niches within social groups. Indeed, people with a high clustering coefficient (greater than or equal to 0.75) have a high average strain-sharing rate with their first-degree connections (median 10.3%). Conversely, people with a low clustering coefficient (less than or equal to 0.25) have a lower strain-sharing rate (8.43%) (two-sided Wilcoxon rank-sum test, P_adj ≤ 0.05) (Fig. 5a).

**Fig. 5: Social and microbiome strain niches.**

To observe this phenomenon at a village scale, we identify both social and microbiome clusters using Louvain clustering^34,35,36. If strain-sharing rates are significantly elevated within social network clusters, we would expect a correspondence between social network clusters and clusters of microbially similar people. We formed microbiome clusters based on the strain-sharing network within a village, with ties between people discerned solely by virtue of the extent to which they share microbiome strains and weighted by the strain-sharing rate (Fig. 5b and Supplementary Table 4). In parallel, we formed social clusters based on familial and social connections (without weighting) (Fig. 5c). On average, this method yielded social clusters of 11 people with an average of 24 intra-cluster relationships, and microbiome clusters of 17 people with an average intra-cluster strain-sharing rate of 8.5%. We can then paint the microbiome cluster membership onto the social network clustering and visualize the correspondence between social communities and microbiome communities (Fig. 5d) (as a robustness check, we also evaluated Leiden clustering³⁷; Supplementary Fig. 9).

Across the villages, social clusters overlap visually with microbiome clusters (shown for one village in Fig. 5b–d). To test this effect statistically, we can evaluate the correspondence between social and microbiome cluster membership with the adjusted Rand index³⁸. To observe the distribution of this statistic if there was independence between the microbiome of a host and their social network, we can compare our observed index to a microbiome permutation null, where we randomly swap the microbiome of every person in the village. We observe that social cliques correspond to microbial cliques at a significant rate in all 18 villages (maximum P < 0.05) (Fig. 5e). Across 10,000 microbiome permutations, in only two villages does any random permutation ever lead to more overlap between social and microbiome clusters than the observed overlap (Sestao and Zarautz in Fig. 5e).

If social clustering reinforces within-cluster microbial sharing, we would also expect different social clusters to have differentially abundant bacteria. To test this, we compared whether the relative abundance of each species differed across social clusters. After Benjamini–Hochberg multiple testing correction, we found 138 examples of species that were differentially abundant in different network communities out of 17,278 tests (Extended Data Fig. 6 shows the P value distributions of the Kruskal–Wallis tests). Figure 5f,g shows examples of two species (Enterococcus faecium (SGB7967) and Coriobacteriaceae SGB14372) that are differentially abundant in different network regions of an illustrative village.

Discussion

Using detailed social network mapping and strain-level microbiome genomics in 18 isolated Honduras villages, we find a substantial correspondence between social structure and microbiome sharing beyond familial or household relationships. The amount of strain-sharing seems to be modulated according to the nature of the social relationships, even after accounting for other measured attributes (such as diet and medications). More intimate relationships share more strains, and strain-sharing rates increase monotonically based on the frequency with which a pair of people shares meals or free time together. The strain-sharing rate was the strongest predictor of social relationships, beyond socio-demographic features such as wealth, religion or education. Pairs of people who are connected within a village also come to share more strains over a 2-year follow-up than otherwise similar pairs of unconnected people. Furthermore, we observe significantly elevated strain-sharing levels out to a social horizon of two degrees of separation. Host network position, whether central or peripheral, moderates exposure to the microbial metacommunity within the villages such that more socially isolated people tend to be more microbially isolated as well. Overall, the intricate groundwork provided by the social network structure of human populations seems to provide a set of niches within which microbes can thrive or spread.

We are unable to distinguish direct transmission of strains from indirect transfer (for example, via unobserved social connections), nor can we infer the directionality of any potential transmission between two people sharing a strain. Although we control for factors such as diet, medication use and water source, and although we have longitudinal data for some analyses, it is not possible—with observational data alone—to fully distinguish shared environment from transmission. However, the genetic specificity of strains is consistent with transmission, especially in light of the human-host specificity of some transmitted species^39,40. Strain-level resolution helps shed light on the idea that similar microbial species seen within members of the same household may be based not only on a modulation by similar environmental conditions or shared genetics, but also on spread between people. Our ability to also find strain-sharing among people who are not genetically related and do not reside in the same household, but who are known to interact, bolsters this conclusion.

A previous study of 287 people in five villages in Fiji documented strain-sharing between spouses, household members and a subset of other social interactions¹. A study examining 7,646 people from 31 communities in 20 (mostly developed) countries also focused on kin and same-household ties⁴, and reported that the strain-sharing rate for the gut microbiome for non-cohabiting adults within the same village generally was 8%. Our estimate of this parameter was 4%. However, since we mapped a wider range of social relationships, beyond just familial or household ties, we have a clearer understanding of whether village co-residents actually interact with one another. In other words, our estimate of pairs of people who are simply village co-residents includes only people who do not, in fact, interact socially.

Using both observational and experimental methods, diverse phenomena have been shown to spread interpersonally, including phenotypes such as obesity and depression^41,42. To the extent that the microbiome can be associated with physical or mental states⁴³, then any spread of the microbiome via biological contagion may partly explain the ostensible spread of certain other attributes via social contagion^41,44. It may prove to be the case that groups of interconnected people might share phenotypes not only because of shared genes or transmitted behaviours, but also because of shared microbes.

Methods

Local involvement and ethics

We worked closely with the local population of Copan, sought approval and feedback from officials at the Ministry of Health (MOH) of Honduras, and endeavoured to provide practical benefits to the local community. When we began designing the underlying cohort project in 2013 (in 176 villages, including the 18 used here), the Bill and Melinda Gates Foundation introduced us to the Inter-American Development Bank (IDB), which has been supporting and doing work throughout Latin America, and the IDB in turn introduced us to the MOH. Because of this pathway to getting the project launched, we worked with local and regional public health agencies and with local leaders rather than with academic partners.

The area we chose to work in the western highlands of Honduras, Copan, is very isolated. Over the years, as we built our data collection team in Copan, we developed deep ties to the local community, to local village leaders and to the few local health clinics there, as well as to local transportation and infrastructure providers. Because of these ties and our commitment to the local community, we presented our results directly to these constituencies regularly at the completion of our various projects.

We provided other material benefits to the local community, beyond simply providing them with information. When we tested people for stool parasites, we gave them the results of their tests and arranged for them to be treated. When we tested people for vision, we provided corrective glasses. We solicited ideas from the local community about what infrastructure improvements we could make, and we repaired many local playgrounds and clinics as a result. We arranged for an American company to provide free portable handheld ultrasound devices to the local health clinics, which was much appreciated by local providers. In terms of capacity building, we hired and trained over 100 local people, and many of our former data collectors have gone on to work for other public health and development entities. Finally, we offered a talented young person from Copan a position as a PhD student in the USA.

Throughout our work in Honduras, along with our extensive involvement at local and national levels, we have endeavoured to act with integrity, curiosity and respect in all our relationships.

This research would not have been prohibited in the USA. This work is not likely to result in stigmatization, incrimination or discrimination or personal risk for the participants, and we have safeguarded all data from threats to the privacy or security of our participants.

All participants provided informed consent, and our work was approved by the Yale Committee on Human Subjects (reference no. 2000020688).

Network construction

Village-level networks were mapped with standard ‘name generators’ for the whole village. After a photographic census (of all adolescent and adult residents) was taken for each village, we conducted the main network survey in each village, including a detailed, hour-long survey⁷, incorporating demographic and health measures, as well as a battery of name generators with which respondents identified relevant social relationships (friends, family members, people they spend free time with, and so on) through names and photographs shown in our TRELLIS software (available at trellis.yale.edu)⁴⁵. All the name generator questions are listed in Supplementary Table 1.

For questions in which a pair reported different levels of the same variable, such as greeting type or the amount of free time, we symmetrized the variables as follows: for greeting type, we reported the greeting type involving the most physical contact. For the frequency of free time and shared meals between a pair, we symmetrized by choosing the response that indicates more frequent contact. We symmetrized all other responses at the relationship level here (that is, when either of two people nominate each other as a ‘close friend’, we counted it). When calculating degree distributions, centralities and clustering, we simplified our networks to remove multiplexity (that is, we concatenated all ties between pairs of people) and symmetrized the ties (that is, we ignored who nominated whom in each pair).

Social network graphs were analysed and geodesic distances and centrality measures were calculated with igraph (v.1.3.5)⁴⁶ and plotted with the Fruchterman–Reingold algorithm. To protect the anonymity of our study villages, the villages were renamed to random town names from another country.

Sample collection and sequencing

Participants were instructed on how to self-collect the faecal samples using a training module delivered in person in the villages and were asked to return samples promptly to the local team. Samples were refrigerated immediately upon collection and then stored in liquid nitrogen at the collection site within 12 h after collection and moved to a −80 °C freezer in Copan Ruinas, Honduras. All the villages followed the same procedures. Samples were shipped, in randomized allotments, on dry ice to the USA and stored in −80 °C freezers.

Stool material was homogenized using TissueLyzer from Qiagen, and the lysate was prepared for extraction with the Chemagic Stool gDNA extraction kit (Perkin Elmer) and extracted on the Chemagic 360 Instrument (Perkin Elmer) following the manufacturer’s protocol. Sequencing libraries were prepared using the KAPA Hyper Library Preparation kit (KAPA Biosystems). Shotgun metagenomic sequencing was carried out on Illumina NovaSeq 6000. Samples not reaching the desired sequencing depth of 50 Gbp were resequenced on a separate run. Raw metagenomic reads were deduplicated using prinseq lite⁴⁷ (v.0.20.2) with default parameters. The resulting reads were screened for human contamination (hg19) with BMTagger and then quality filtered with Trimmomatic⁴⁸ (v.0.36, parameters ‘ILLUMINACLIP: nextera_truseq_adapters.fasta:2:30:10:8:true SLIDINGWINDOW: 4:15 LEADING: 3 TRAILING: 3 MINLEN: 50’). This resulted in a total of 1,787 samples (with an average size of 8.6 × 10⁷ reads).

Species-level and strain-level profiling

Species-level profiling was performed using MetaPhlAn 4²⁶ using the Jan21 database and default parameters. Strain-level profiling was performed for a subset of species present in at least 50 samples using StrainPhlAn 4²⁶ with parameters ‘–marker_in_n_samples 1 –sample_with_n_markers 10 — phylophlan_mode accurate’. This resulted in a total of 841 species-level genome bins (SGB) and 339,137 profiled strains. The StrainPhlAn ‘strain_transmission.py’ script was used to assess transmission events using the produced trees, which yielded a total of 513,177 identified events. For a robust calculation, strain-sharing rates were calculated only for pairs sharing at least ten SGBs.

Beta diversity indices were calculated using the vegdist function from the vegan R package (v.2.6-2)⁴⁹.

Separation of distances by village membership was tested by permutational multivariate analysis of variance (PERMANOVA) using the adonis function from the vegan R package with 999 permutations.

Statistical analyses

All statistical analyses were performed in R (v.4.1.3). Correction for multiple testing (Benjamini–Hochberg procedure, marked P_adj) was applied when appropriate, and significance was defined at P_adj< 0.05. All tests were two-sided except where otherwise specified. All egocentric regressions (that is, when we assess the relationship of network position and strain-sharing) involved linear mixed-effects models with this general formula specification:

$$begin{array}{c}{rm{O}}{rm{u}}{rm{t}}{rm{c}}{rm{o}}{rm{m}}{rm{e}},{rm{o}}{rm{f}},{rm{i}}{rm{n}}{rm{t}}{rm{e}}{rm{r}}{rm{e}}{rm{s}}{rm{t}} sim {rm{p}}{rm{r}}{rm{e}}{rm{d}}{rm{i}}{rm{c}}{rm{t}}{rm{o}}{rm{r}},{rm{o}}{rm{f}},{rm{i}}{rm{n}}{rm{t}}{rm{e}}{rm{r}}{rm{e}}{rm{s}}{rm{t}}+{rm{a}}{rm{g}}{rm{e}}+{rm{s}}{rm{e}}{rm{x}} ,+,{rm{B}}{rm{M}}{rm{I}}+{rm{B}}{rm{r}}{rm{i}}{rm{s}}{rm{t}}{rm{o}}{rm{l}},{rm{s}}{rm{t}}{rm{o}}{rm{o}}{rm{l}},{rm{s}}{rm{c}}{rm{a}}{rm{l}}{rm{e}}+{rm{h}}{rm{o}}{rm{u}}{rm{s}}{rm{e}}{rm{h}}{rm{o}}{rm{l}}{rm{d}},{rm{w}}{rm{e}}{rm{a}}{rm{l}}{rm{t}}{rm{h}},{rm{i}}{rm{n}}{rm{d}}{rm{e}}{rm{x}} ,+,{rm{d}}{rm{i}}{rm{e}}{rm{t}},{rm{d}}{rm{i}}{rm{v}}{rm{e}}{rm{r}}{rm{s}}{rm{i}}{rm{t}}{rm{y}},{rm{s}}{rm{c}}{rm{o}}{rm{r}}{rm{e}}+{rm{m}}{rm{e}}{rm{d}}{rm{i}}{rm{c}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{u}}{rm{s}}{rm{a}}{rm{g}}{rm{e}}+{rm{w}}{rm{a}}{rm{t}}{rm{e}}{rm{r}},{rm{s}}{rm{o}}{rm{u}}{rm{r}}{rm{c}}{rm{e}} ,+,{rm{D}}{rm{N}}{rm{A}},{rm{c}}{rm{o}}{rm{n}}{rm{c}}{rm{e}}{rm{n}}{rm{t}}{rm{r}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}+{rm{s}}{rm{e}}{rm{q}}{rm{u}}{rm{e}}{rm{n}}{rm{c}}{rm{i}}{rm{n}}{rm{g}},{rm{d}}{rm{e}}{rm{p}}{rm{t}}{rm{h}}+{rm{e}}{rm{x}}{rm{t}}{rm{r}}{rm{a}}{rm{c}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{d}}{rm{a}}{rm{t}}{rm{e}} ,+,{rm{s}}{rm{h}}{rm{i}}{rm{p}}{rm{p}}{rm{i}}{rm{n}}{rm{g}},{rm{b}}{rm{a}}{rm{t}}{rm{c}}{rm{h}}+{rm{s}}{rm{e}}{rm{q}}{rm{u}}{rm{e}}{rm{n}}{rm{c}}{rm{i}}{rm{n}}{rm{g}},{rm{b}}{rm{a}}{rm{t}}{rm{c}}{rm{h}}+{rm{e}}{rm{x}}{rm{t}}{rm{r}}{rm{a}}{rm{c}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{b}}{rm{a}}{rm{t}}{rm{c}}{rm{h}} ,+,(1|{rm{v}}{rm{i}}{rm{l}}{rm{l}}{rm{a}}{rm{g}}{rm{e}})+(1|{rm{b}}{rm{u}}{rm{i}}{rm{l}}{rm{d}}{rm{i}}{rm{n}}{rm{g}})end{array}$$

That is, we controlled for age, sex, wealth, Bristol stool scale and body mass index (BMI), as well as sample properties (for example, DNA concentration) and village fixed effects. We also included household water source, individual medication usage in the last month and diet diversity (the number of food categories consumed on a daily basis¹⁰). Medication types included: painkillers, antibiotics, anti-diarrhoeal, anti-parasitic, anti-fungal, anti-diabetics, antacids, laxatives and vitamins. Mixed-effects models were created with the lmertest package (v.3.1.3)⁵⁰.

Network predictions

Mixed-effects logistic regression models were used for out-of-sample network predictions. Class-balanced data sets were constructed by down-sampling the number of unrelated pairs to equal the number of related pairs, and we trained our model using k-fold cross-validation with k = 3, and predictions from the three separate test sets were combined. ROC curves were constructed from the average of five sets of threefold cross-validation. ROC curves and confidence intervals were calculated with the pROC package (v.1.18.0)⁵¹ and logistic regression models were constructed with the lmertest package (v.3.1.3) with the binomial family link function and a random slope per village.

The predictive model including all covariates was specified by the following formula:

$$begin{array}{c}{rm{R}}{rm{e}}{rm{l}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}{rm{s}}{rm{h}}{rm{i}}{rm{p}} sim {rm{m}}{rm{i}}{rm{c}}{rm{r}}{rm{o}}{rm{b}}{rm{i}}{rm{o}}{rm{m}}{rm{e}},{rm{s}}{rm{i}}{rm{m}}{rm{i}}{rm{l}}{rm{a}}{rm{r}}{rm{i}}{rm{t}}{rm{y}}+{rm{s}}{rm{e}}{rm{x}} ,+,{rm{i}}{rm{n}}{rm{d}}{rm{i}}{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{u}}{rm{s}},{rm{s}}{rm{t}}{rm{a}}{rm{t}}{rm{u}}{rm{s}}+{rm{r}}{rm{e}}{rm{l}}{rm{i}}{rm{g}}{rm{i}}{rm{o}}{rm{n}}+{rm{a}}{rm{g}}{rm{e}},{rm{d}}{rm{i}}{rm{f}}{rm{f}}{rm{e}}{rm{r}}{rm{e}}{rm{n}}{rm{c}}{rm{e}} ,+,{rm{a}}{rm{v}}{rm{e}}{rm{r}}{rm{a}}{rm{g}}{rm{e}},{rm{a}}{rm{g}}{rm{e}}+{rm{w}}{rm{e}}{rm{a}}{rm{l}}{rm{t}}{rm{h}},{rm{d}}{rm{i}}{rm{f}}{rm{f}}{rm{e}}{rm{r}}{rm{e}}{rm{n}}{rm{c}}{rm{e}}+{rm{a}}{rm{v}}{rm{e}}{rm{r}}{rm{a}}{rm{g}}{rm{e}},{rm{w}}{rm{e}}{rm{a}}{rm{l}}{rm{t}}{rm{h}} ,+,{rm{e}}{rm{d}}{rm{u}}{rm{c}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{d}}{rm{i}}{rm{f}}{rm{f}}{rm{e}}{rm{r}}{rm{e}}{rm{n}}{rm{c}}{rm{e}}+{rm{a}}{rm{v}}{rm{e}}{rm{r}}{rm{a}}{rm{g}}{rm{e}},{rm{e}}{rm{d}}{rm{u}}{rm{c}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}} ,+,{rm{m}}{rm{e}}{rm{d}}{rm{i}}{rm{c}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{u}}{rm{s}}{rm{a}}{rm{g}}{rm{e}}+{rm{s}}{rm{a}}{rm{m}}{rm{e}},{rm{w}}{rm{a}}{rm{t}}{rm{e}}{rm{r}},{rm{s}}{rm{o}}{rm{u}}{rm{r}}{rm{c}}{rm{e}}+{rm{d}}{rm{i}}{rm{e}}{rm{t}} ,+,{rm{B}}{rm{r}}{rm{i}}{rm{s}}{rm{t}}{rm{o}}{rm{l}},{rm{s}}{rm{t}}{rm{o}}{rm{o}}{rm{l}},{rm{s}}{rm{c}}{rm{a}}{rm{l}}{rm{e}}+{rm{h}}{rm{o}}{rm{u}}{rm{s}}{rm{e}}{rm{h}}{rm{o}}{rm{l}}{rm{d}},{rm{s}}{rm{h}}{rm{a}}{rm{r}}{rm{i}}{rm{n}}{rm{g}} ,+,(0+{rm{m}}{rm{i}}{rm{c}}{rm{r}}{rm{o}}{rm{b}}{rm{i}}{rm{o}}{rm{m}}{rm{e}},{rm{s}}{rm{i}}{rm{m}}{rm{i}}{rm{l}}{rm{a}}{rm{r}}{rm{i}}{rm{t}}{rm{y}}|{rm{v}}{rm{i}}{rm{l}}{rm{l}}{rm{a}}{rm{g}}{rm{e}},{rm{I}}{rm{D}})end{array}$$

where ‘microbiome similarity’ is either the strain-sharing rate, Jaccard index or Bray–Curtis dissimilarity calculated between the members of a pair.

Variable importance metrics were calculated based on the permutation feature importance metric using the car R package (v.3.0). The permutation feature importance is defined to be the decrease in a model score when a single feature value is shuffled randomly⁵². This procedure breaks the relationship between the feature and the target; thus, the drop in the model score is indicative of how much the model depends on the feature. Variable importance metrics were analysed after 1,000 random permutations of each feature. Variable inflation factor values were calculated to ensure the reliability of results against collinearity of variables and were all low (less than 2).

Microbiome null permutations

Microbiome null permutations create a null distribution of strain-sharing rates between any two people while accounting for (just) the network structure. Under the null hypothesis that a host’s microbiome composition and social network are independent, we can sever their relationship by randomly permuting the microbiome of every person in the village and recalculating metrics of interest, for example, strain-sharing by degree or clustering Rand indices. This ensures that the inherent structural pattern of the network remains the same, but the node values are randomized. This allows us to observe the distribution of our statistics if the human microbiome is fostered independently of any host social interactions.

Village-wide microbiome permutations were used to calculate null distributions for the strain-sharing rate by geodesic distance and for the clustering results. For relationship-specific permutations in Supplementary Fig. 1, permutations at the relationship level were taken instead of full village permutations. The observed distribution of relationship-specific sharing was compared with the distribution of sharing observed when that specific relationship tie was permuted, for example, comparing the sharing between someone and their friend versus someone and 100 random people’s friends in the same village. For the inherently gendered relationships of husband/wife and mother/father of a child, we accounted for the sex of the ego, but for all other relationships that are not necessarily gendered (for example, free time), we did not.

Longitudinal analyses

A subset of 301 people from four villages were followed-up after a period of 2 years and asked to provide a second stool sample. Samples were processed consistently with the same pipeline used to analyse the previously processed 1,787 samples.

We defined relationship ties by using the same social network from the initial wave and evaluated the following linear mixed-effect model formula:

$${rm{SS}}{{rm{R}}}_{{rm{T}}2} sim SS{R}_{{rm{T}}1}+{rm{relationship}}+M+(1| {rm{village}},{rm{ID}})+(1| {rm{ego}})$$

where SSR_T1 and SSR_T2 are the strain-sharing rate in pairs of people at time points T1 and T2, respectively. We show standardized coefficients.

To decompose the effect of sharing across all species, we used a mixed-effect logistic model formula specified as follows:

$${rm{T}}{2}_{S} sim {rm{T}}{1}_{S}+{rm{relationship}}+M+(1| {rm{species}})+(1| {rm{villageID}})+(1| {rm{ego}})$$

where ({rm{T}}{1}_{S}) and ({rm{T}}{2}_{S}) are binary variables indicating whether we observed strain-sharing of an individual species at time point T1 or T2, for all species combined. A random intercept for each individual species was added as well as for village membership and person.

In both models, ‘relationship’ is a dummy variable indicating the presence (or absence) of a tie between the pair of people, and M is the Mahalanobis distance calculated on the following covariates:

$$begin{array}{c}M={rm{M}}{rm{a}}{rm{h}}{rm{a}}{rm{l}}{rm{a}}{rm{n}}{rm{o}}{rm{b}}{rm{i}}{rm{s}}({rm{a}}{rm{g}}{rm{e}},{rm{s}}{rm{e}}{rm{x}},{rm{B}}{rm{M}}{rm{I}},{rm{B}}{rm{r}}{rm{i}}{rm{s}}{rm{t}}{rm{o}}{rm{l}},{rm{s}}{rm{t}}{rm{o}}{rm{o}}{rm{l}},{rm{s}}{rm{c}}{rm{a}}{rm{l}}{rm{e}}, ,,{rm{h}}{rm{o}}{rm{u}}{rm{s}}{rm{e}}{rm{h}}{rm{o}}{rm{l}}{rm{d}},{rm{w}}{rm{e}}{rm{a}}{rm{l}}{rm{t}}{rm{h}},{rm{i}}{rm{n}}{rm{d}}{rm{e}}{rm{x}},{rm{d}}{rm{i}}{rm{e}}{rm{t}},{rm{d}}{rm{i}}{rm{v}}{rm{e}}{rm{r}}{rm{s}}{rm{i}}{rm{t}}{rm{y}},{rm{i}}{rm{n}}{rm{d}}{rm{e}}{rm{x}}, ,,{rm{m}}{rm{e}}{rm{d}}{rm{i}}{rm{c}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}{rm{n}},{rm{u}}{rm{s}}{rm{a}}{rm{g}}{rm{e}},{rm{w}}{rm{a}}{rm{t}}{rm{e}}{rm{r}},{rm{s}}{rm{o}}{rm{u}}{rm{r}}{rm{c}}{rm{e}},{rm{b}}{rm{u}}{rm{i}}{rm{l}}{rm{d}}{rm{i}}{rm{n}}{rm{g}},{rm{I}}{rm{D}})end{array}$$

The pairwise Mahalanobis distance was calculated on the covariates matrix using the D2.dist function from the biotools R package⁵³ (v.4.2). We also specified this model using the constituent variables, rather than the Mahalonobis distance (Supplementary Data 2).

Microbiome and social clustering

We use the Louvain and the Leiden methods as implemented in the igraph package to cluster participants along social and microbiome lines. Louvain clustering is based on greedy modularity optimization. Modularity is a scale value between −0.5 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities compared with edges outside communities. Optimizing this value theoretically results in the best possible grouping of the nodes of a given network. In cases where a pair shared too few SGBs to calculate a robust strain-sharing rate (fewer than ten), a strain-sharing rate of 0% was imputed to allow for proper weight-based clustering. This occurred in 0.45% of the pairwise comparisons (16,228 out of 3,560,769 comparisons), and just 838 of the 16,228 comparisons were from people in the same village. The adjusted Rand index was calculated with the mclust package (v.6.0.0)⁵⁴.

For testing species differential abundance across network communities with the Kruskal–Wallis test, robustness checks ensuring that each social cluster had more than five people and the species was present in more than five people in the village were performed, and cases where this criterion was not met were excluded.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Raw metagenomic data are available on the NCBI Sequence Read Archive database with accession PRJNA999635. Abundance tables and certain strain-level information are available in Supplementary Data 4 and also available at Zenodo (https://zenodo.org/records/11150476)⁵⁵. Core metadata for each participant (their age, sex, BMI, Bristol Stool Scale and village ID) are publicly available at Zenodo (https://zenodo.org/records/11150476)⁵⁵. Additional, more confidential metadata (as specified by human participant constraints) are available in two separate files, and are available at Zenodo (https://zenodo.org/records/11153185 (ref. ⁵⁶) and https://zenodo.org/records/11153210 (ref. ⁵⁷)). One file includes household ID, medications, diet, education, wealth, religion and indigenous status. A second file includes the social interaction data (the sociocentric graphs). Either or both of these two additional files can be requested by academic researchers from established institutions (with IRB approval) by filing a request directly from the Zenodo record. These two files are non-transferable to other investigators, and also are not for commercial use. Data release is subject to provisions in force at Yale University and the Yale Institute for Network Science at the time of release. Data access requests will be evaluated monthly, and access will be promptly given to the Zenodo repository for direct downloading.

Code availability

Source code for data analysis and data for reproduction of figures is available on GitHub (https://github.com/human-nature-lab/strain_sharing/) and permanently deposited at Zenodo (https://doi.org/10.5281/zenodo.13737605)⁵⁸.

References

Brito, I. L. et al. Transmission of human-associated microbiota along family and social networks. Nat. Microbiol. 4, 964–971 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar
Sarkar, A. et al. Microbial transmission in animal social networks and the social microbiome. Nat. Ecol. Evol. 4, 1020–1035 (2020).

Article
PubMed

Google Scholar
Dill-McFarland, K. A. et al. Close social relationships correlate with human gut microbiota composition. Sci. Rep. 9, 703 (2019).

Article
ADS
PubMed
PubMed Central

Google Scholar
Valles-Colomer, M. et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature 614, 125–135 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Gacesa, R. et al. Environmental factors shaping the gut microbiome in a Dutch population. Nature 604, 732–739 (2022).

Article
ADS
CAS
PubMed

Google Scholar
Asnicar, F. et al. Studying vertical microbiome transmission from mothers to infants by strain-level metagenomic profiling. mSystems 2, e00164-16 (2017).

Article
PubMed
PubMed Central

Google Scholar
Airoldi, E. M. & Christakis, N. A. Induction of social contagion for diverse outcomes in structured experiments in isolated villages. Science 384, eadi5147 (2024).

Article
MathSciNet
CAS
PubMed

Google Scholar
Mohajeri, M. H. et al. The role of the microbiome for human health: from basic science to clinical applications. Eur. J. Nutr. 57, 1–14 (2018).

Article
PubMed
PubMed Central

Google Scholar
Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).

Article
ADS
CAS
PubMed

Google Scholar
Shridhar, S. V. et al. Environmental, socioeconomic, and health factors associated with gut microbiome species and strains in isolated Honduras villages. Cell Rep. 43, 114442 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar
Korpela, K. et al. Selective maternal seeding and environment shape the human gut microbiome. Genome Res. 28, 561–568 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar
Podlesny, D. & Fricke, W. F. Strain inheritance and neonatal gut microbiota development: a meta-analysis. Int. J. Med. Microbiol. 311, 151483 (2021).

Article
CAS
PubMed

Google Scholar
Tung, J. et al. Social networks predict gut microbiome composition in wild baboons. eLife 4, e05224 (2015).

Article
PubMed
PubMed Central

Google Scholar
Raulo, A. et al. Social networks strongly predict the gut microbiota of wild mice. ISME J. 15, 2601–2613 (2021).

Article
PubMed
PubMed Central

Google Scholar
Johnson, K. V.-A., Watson, K. K., Dunbar, R. I. M. & Burnet, P. W. J. Sociability in a non-captive macaque population is associated with beneficial gut bacteria. Front. Microbiol. 13, 1032495 (2022).

Article
PubMed
PubMed Central

Google Scholar
Amato, K. R. et al. Patterns in gut microbiota similarity associated with degree of sociality among sex classes of a neotropical primate. Microb. Ecol. 74, 250–258 (2017).

Article
ADS
PubMed

Google Scholar
Moeller, A. H. et al. Social behavior shapes the chimpanzee pan-microbiome. Sci. Adv. 2, e1500997 (2016).

Article
ADS
PubMed
PubMed Central

Google Scholar
Raulo, A. et al. Social and environmental transmission spread different sets of gut microbes in wild mice. Nat. Ecol. Evol. 8, 972–985 (2024).

Article
PubMed
PubMed Central

Google Scholar
Perkins, J. M., Subramanian, S. V. & Christakis, N. A. Social networks and health: a systematic review of sociocentric network studies in low- and middle-income countries. Soc. Sci. Med. 125, 60–78 (2015).

Article
PubMed

Google Scholar
Apicella, C. L., Marlowe, F. W., Fowler, J. H. & Christakis, N. A. Social networks and cooperation in hunter-gatherers. Nature 481, 497–501 (2012).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Abdill, R. J., Adamowicz, E. M. & Blekhman, R. Public human microbiome data are dominated by highly developed countries. PLoS Biol. 20, e3001536 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar
Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: interpreting strains in microbiomes. Nat. Rev. Microbiol. 18, 491–506 (2020).

Article
PubMed
PubMed Central

Google Scholar
Gardy, J. L. et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N. Engl. J. Med. 364, 730–739 (2011).

Article
CAS
PubMed

Google Scholar
Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar
Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023).

Article
PubMed
PubMed Central

Google Scholar
Ianiro, G. et al. Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases. Nat. Med. 28, 1913–1923 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar
Pasolli, E. et al. Large-scale genome-wide analysis links lactic acid bacteria from food with the gut microbiome. Nat. Commun. 11, 2610 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Yassour, M. et al. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24, 146–154.e4 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar
Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Altmann, A., Toloşi, L., Sander, O. & Lengauer, T. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010).

Article
CAS
PubMed

Google Scholar
Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

Article
CAS
PubMed
PubMed Central

Google Scholar
Jacoby, R. P. & Kopriva, S. Metabolic niches in the rhizosphere microbiome: new tools and approaches to analyse metabolic mechanisms of plant–microbe nutrient exchange. J. Exp. Bot. 70, 1087–1094 (2018).

Article

Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

Article

Google Scholar
Kanter, I., Yaari, G. & Kalisky, T. Applications of community detection algorithms to large biological datasets. Methods Mol. Biol. 2243, 59–80 (2021).

Article
CAS
PubMed

Google Scholar
Didier, G., Valdeolivas, A. & Baudot, A. Identifying communities from multiplex biological networks by randomized optimization of modularity. F1000Res. 7, 1042 (2018).

Article
PubMed
PubMed Central

Google Scholar
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 9, 1–12 (2019).

Article
CAS

Google Scholar
Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).

Article

Google Scholar
Mallott, E. K. & Amato, K. R. Host specificity of the gut microbiome. Nat. Rev. Microbiol. 19, 639–653 (2021).

Article
CAS
PubMed

Google Scholar
Davenport, E. R. et al. The human microbiome in evolution. BMC Biol. 15, 127 (2017).

Article
PubMed
PubMed Central

Google Scholar
Christakis, N. A. & Fowler, J. H. The spread of obesity in a large social network over 32 years. N. Engl. J. Med. 357, 370–379 (2007).

Article
CAS
PubMed

Google Scholar
Rosenquist, J. N., Fowler, J. H. & Christakis, N. A. Social network determinants of depression. Mol. Psychiatry 16, 273–281 (2011).

Article
CAS
PubMed

Google Scholar
Smith, L. K. & Wissel, E. F. Microbes and the mind: how bacteria shape affect, neurological processes, cognition, social relationships, development, and pathology. Perspect. Psychol. Sci. 14, 397–418 (2019).

Article
PubMed

Google Scholar
Finlay, B. B., CIFAR Humans & Microbiome Are noncommunicable diseases communicable? Science 367, 250–251 (2020).
Lungeanu, A. et al. Using Trellis software to enhance high-quality large-scale network data collection in the field. Soc. Networks 66, 171–184 (2021).

Article
PubMed
PubMed Central

Google Scholar
Csardi, G., Nepusz, T. & Others. The igraph software package for complex network research. InterJournal Complex Systems 1695, 1–9 (2006).

Google Scholar
Cantu, V. A., Sadural, J. & Edwards, R. PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets. Preprint at PeerJ https://doi.org/10.7287/peerj.preprints.27553 (2019).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

Article
CAS
PubMed
PubMed Central

Google Scholar
Oksanen, J. et al. vegan: Community Ecology Package. R package version 2.0-10 https://cran.r-project.org/web/packages/vegan/index.html (2008).
Kuznetsova, A., Brockhoff, P. B., Christensen, R. H. B. lmerTest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).

Article

Google Scholar
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 12, 77 (2011).

Article

Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

Article

Google Scholar
da Silva, A. R., Malafaia, G. & Menezes, I. P. P. Biotools: an R function to predict spatial gene diversity via an individual-based approach. Genet. Mol. Res. 16, gmr16029655 (2017).

Article

Google Scholar
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. Mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

Article
PubMed
PubMed Central

Google Scholar
Beghini, F. et al. Detailed social network interactions and gut microbiome strain-sharing within isolated Honduras villages. Zenodo https://doi.org/10.5281/zenodo.11150475 (2024).
Beghini, F., Christakis, N. & Nicoll, L. Detailed social network interactions and gut microbiome strain-sharing within isolated Honduras villages. Zenodo https://doi.org/10.5281/zenodo.11153184 (2024).
Beghini, F., Christakis, N. & Nicoll, L. Detailed social network interactions and gut microbiome strain-sharing within isolated Honduras villages. Zenodo https://doi.org/10.5281/zenodo.11153209 (2024).
Beghini, F. & Pullman, J. human-nature-lab/strain_sharing. Zenodo https://doi.org/10.5281/zenodo.13737605 (2024).

Download references

Acknowledgements

We thank all the study participants and the local organizations, doctors and community leaders in Honduras with whom we have interacted. We thank J. E. Gámez and E. J. Urrea Carbajal for coordinating the field work in Honduras; R. Negron, L. Nicoll, A. L. Rodriguez and T. Keegan for their support with respect to field operations, data collection and data management; YCGA (Yale Center for Genomic Analysis) for sequencing the metagenomic libraries; and Q. Shi for processing the specimens and handling the extractions. We thank M. Baym, E. Feltham, M. Gerstein and M. Jones for helpful comments on the manuscript. We have benefited from many local connections and support, including to the following local partner organizations that played a role in the development or surveying of our cohort: Sistemas Soluciones para Estudios de la Salud (SES), MURE Consultores and World Vision. We thank the Ministry of Health in Honduras and Inter-American Developmental Bank for their extensive support and cooperation. This work was supported by the NOMIS Foundation, with additional support from Schmidt Futures, the Pershing Square Foundation and the Rothberg Catalyzer Fund. The core cohort was originally established with support from the Bill and Melinda Gates Foundation.

Author information

These authors contributed equally: Francesco Beghini, Jackson Pullman

Authors and Affiliations

Yale Institute for Network Science, Yale University, New Haven, CT, USA

Francesco Beghini, Jackson Pullman, Marcus Alexander, Shivkumar Vishnempet Shridhar & Nicholas A. Christakis
Department of Statistics and Data Science, Yale University, New Haven, CT, USA

Jackson Pullman & Nicholas A. Christakis
Department of Biomedical Engineering, Yale University, New Haven, CT, USA

Shivkumar Vishnempet Shridhar & Nicholas A. Christakis
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA

Drew Prinster
Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA

Adarsh Singh & Ilana L. Brito
Soluciones para Estudios de la Salud, Copán, Honduras

Rigoberto Matute Juárez
Department of Statistics, Operations and Data Science, Fox School of Business, Temple University, Philadelphia, PA, USA

Edoardo M. Airoldi
Data Science Institute, Temple University, Philadelphia, PA, USA

Edoardo M. Airoldi
Department of Medicine, Yale School of Medicine, New Haven, CT, USA

Nicholas A. Christakis

Authors

Francesco Beghini

View author publications

You can also search for this author in
PubMed Google Scholar
Jackson Pullman

View author publications

You can also search for this author in
PubMed Google Scholar
Marcus Alexander

View author publications

You can also search for this author in
PubMed Google Scholar
Shivkumar Vishnempet Shridhar

View author publications

You can also search for this author in
PubMed Google Scholar
Drew Prinster

View author publications

You can also search for this author in
PubMed Google Scholar
Adarsh Singh

View author publications

You can also search for this author in
PubMed Google Scholar
Rigoberto Matute Juárez

View author publications

You can also search for this author in
PubMed Google Scholar
Edoardo M. Airoldi

View author publications

You can also search for this author in
PubMed Google Scholar
Ilana L. Brito

View author publications

You can also search for this author in
PubMed Google Scholar
Nicholas A. Christakis

View author publications

You can also search for this author in
PubMed Google Scholar

Contributions

F.B., M.A., I.L.B. and N.A.C. conceived and designed the study. I.L.B. and N.A.C. supervised the project. F.B., J.P., M.A., E.M.A., I.L.B. and N.A.C. contributed to the methodology design and analytic approach. F.B., M.A., R.M.J., I.L.B. and N.A.C. collected the data. F.B., J.P., M.A., S.V.S., A.S., D.P. and N.A.C. performed the statistical analyses and interpreted the findings. F.B., J.P., M.A., I.L.B. and N.A.C. wrote the manuscript.

Corresponding author

Correspondence to
Nicholas A. Christakis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Visualization of microbiome species relative abundance data across villages.

Data are shown after ordination with principal coordinates analysis (PCoA) on the Bray-Curtis dissimilarity index, coloured by village membership, for the five most populous villages in the Honduras microbiome cohort (n = 881). Microbiome samples are distinguished by village membership for most pairs of villages (PERMANOVA p-values = 0.001, R² = 0.9% to 3.3%) and to some extent when all five villages are combined (PERMANOVA P = 0.001, R² = 3%). The distinction of microbiome clusters by village appears to depend on the village.

Extended Data Fig. 2 Species-level sharing (Bray-Curtis).

A, The distribution of Bray-Curtis dissimilarity based on relationship type. The final two boxes contain the strain-sharing rates between individuals living in the same village without an identified relationship, and all pairs of individuals living in different villages, respectively. Data are represented as boxplots where the middle line is the median and the lower and upper hinges correspond to the first and third quartiles; the whiskers extend from the hinge to the largest or smallest value, but no further than 1.5 * IQR from the hinge. Median values for each distribution are at the top of each box. B, Observed Bray-Curtis dissimilarity for each relationship compared to 100 draws from a within-village relationship permutation. All observed relationships, except for close friends, have a significantly higher Bray-Curtis dissimilarity than the scrambled networks, with the adjusted P-value reported in each figure (two-sided Wilcoxon rank-sum tests). C, Bray-Curtis dissimilarity based on how often a pair spends free time together. D, Bray-Curtis dissimilarity based on how often a pair shares meals together. E, Bray-Curtis dissimilarity based on greeting type. The median values for each distribution in panels A, C-E are also reported at the top of each box.

Extended Data Fig. 3 Non-kin different-house strain-sharing.

A, Strain-sharing among non-kin different-household relationships by frequency of free-time contact. B, Strain-sharing among non-kin different-household relationships by frequency of shared meals. C, Strain-sharing among non-kin different-household relationships by greeting type. P-values are reported in each figure (two-sided Wilcoxon rank-sum test) for all the significant comparisons.

Extended Data Fig. 4 Strain-sharing rate in reciprocated versus unreciprocated ties.

The strain sharing rate was calculated for pairs of people who reported a reciprocated (n = 2,653) or non-reciprocated (n = 3,035) social tie as a non-kin/friendship relationship. The strain-sharing rate in non-kin reciprocated relationships is increased when compared to non-reciprocated ties in all types of relationships, except for Partner (Wilcoxon rank-sum test Close Friend P = 6.86 × 10⁻⁴, Partner P = 0.78, Personal or Private P = 2.68 × 10⁻¹⁴, Free time P = 2 × 10⁻¹⁷). Data are represented as boxplots where the middle line is the median and the lower and upper hinges correspond to the first and third quartiles; the whiskers extend from the hinge to the largest or smallest value, but no further than 1.5 * IQR from the hinge.

Extended Data Fig. 5 Strain-sharing relationship prediction model permutation feature importance.

Permutation feature importance results for all relationships (A), and for non-kin different-household relationships (B) generated from 100 permutations. In both models, the strain-sharing rate is the strongest predictor of a relationship. Orange bars at the top of each plot indicate 95% confidence intervals for the drop in model score.

Extended Data Fig. 6 Species niches P-value distributions.

A, Distribution of unadjusted p-values for the Kruskal-Wallis test for the differential abundance of species across network communities. The distribution is highly left skewed, indicating significant species clustering, whereas, in B, under the null hypothesis that species are randomly distributed among village members, the distribution is uniform.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10, Supplementary Table 1, and legends for Supplementary Tables 1–4 and Supplementary Data 1–4.

Reporting Summary

Supplementary Data 1

Summary information of the mixed-effects model used in the strain-sharing regression analysis.

Supplementary Data 2

Summary information of the mixed-effects models used in the longitudinal analysis regressions.

Supplementary Data 3

Summary information of the mixed-effects models used for the centrality measures regressions.

Supplementary Data 4

MetaPhlAn 4 relative abundance table for the 1,787 analysed samples.

Supplementary Table 2

Village-level demographics and summary characterization of the 1,787 participants in the study.

Supplementary Table 3

Summary and per-village performance metrics of the mixed-effect logistic regression model using all socio-demographic covariates predicting all ties, stable ties and reciprocated ties.

Supplementary Table 4

Social network and strain-sharing network comparison metrics. Clustering coefficient, degree distribution and fraction of shared edges were compared for the networks of each village.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Beghini, F., Pullman, J., Alexander, M. et al. Gut microbiome strain-sharing within isolated village social networks.
Nature (2024). https://doi.org/10.1038/s41586-024-08222-1

Download citation

Received: 27 March 2023
Accepted: 15 October 2024
Published: 20 November 2024
DOI: https://doi.org/10.1038/s41586-024-08222-1

This post was originally published on this site be sure to check out more of their content