Escherichia coli is important because it is biology's premier model organism, is a common commensal of the vertebrate gut, and is a versatile pathogen of humans and animals. Molecular epidemiological studies have classified E. coli strains into a number of phylogroups (phylogroups A, B1, B2, D, and E) (13, 42), which are estimated to have diverged in the last 5 to 9 million years (37, 42). Commensal E. coli strains are beneficial to the host and rarely cause disease. However, several clones of E. coli are responsible for a spectrum of diseases, including urinary tract infection, sepsis/meningitis, and diarrhea (for a review, see reference 15). Diarrheagenic E. coli strains are divided into enterotoxigenic E. coli (ETEC), enteroaggregative E. coli, enteroinvasive E. coli, diffusely adhering E. coli, enteropathogenic E. coli (EPEC), and enterohemorrhagic E. coli (EHEC) strains (15), which have different virulence mechanisms.
Whole-genome sequencing approaches have revealed that E. coli has a conserved core of genes common to both commensal and pathogenic strains. The conserved genome framework is decorated with genomic islands and small clusters of genes that have been acquired by horizontal gene transfer and that in pathogenic strains are often associated with virulence (for a review, see reference 32). EPEC strains provide a striking example of a pathovar highly adapted to virulence in the human intestine (8, 15), but until now no EPEC strain has been fully sequenced.
EPEC was the first pathovar of E. coli to be implicated in human disease (4) and remains a leading cause of infantile diarrhea in developing countries (for a review, see reference 6). However, because EPEC strains were found not to invade cells or release diffusible toxins, doubts about their pathogenic potential were raised in the 1960s and 1970s. However, induction of diarrhea in human volunteers (21) provided the decisive evidence that EPEC is a true human pathogen. As a result of this study, one of the strains tested, E2348/69 (serotype O127:H6), isolated in Taunton, United Kingdom, in 1969, became the prototype strain used globally to study EPEC biology and disease. Indeed, E2348/69 is probably the most-studied pathogenic E. coli strain, and until now it was impossible to place the vast amount of biological data in a genomic context.
Typical EPEC strains, which belong to a limited number of O serogroups, contain the EPEC adherence factor plasmid that encodes the bundle-forming pilus (BFP) (10) and also contain the gene regulator locus per (for a review, see reference 6). Typical EPEC strains are further divided into four distinct lineages, EPEC lineages 1 to 4 (18); E2348/69 belongs to EPEC lineage 1 and to the B2 phylogroup.
The hallmark of EPEC infection is formation of distinct attaching and effacing (A/E) lesions, which are characterized by effacement of the brush border microvilli and intimate bacterial attachment (for reviews, see references 6 and 8). The ability to induce A/E lesions is encoded on a pathogenicity island termed the locus of enterocyte effacement (LEE), which is also present in O157 and non-O157 EHEC strains and the mouse pathogen Citrobacter rodentium (24, 25; for a review, see reference 9). The LEE encodes the adhesin intimin, the structural components of a type III secretion system (T3SS) involved in translocation of effector proteins into the mammalian host cell, gene regulators, chaperones, translocators, and seven effector proteins (EspB, EspF, EspG, EspH, EspZ, Map, and Tir) (for a review, see reference 9). Recent studies have shown that the EPEC strain E2348/69 genome encodes several additional non-LEE effectors, including EspJ (23), EspG2, and EspI/NleA, as well as NleB, NleC, NleD, NleE, and NleH (for a review, see reference 9). Additional putative virulence factors include the autotransporter protein EspC (26), lymphostatin (LifA) (17), and several fimbrial operons. Here we report the genome sequence of E2348/69, describe a bioinformatics survey of this strain's virulence factors, and present the resul