How to read the generic genomics of microbial genomes


2017-08-03 22:16:22 GMT+0800

In 2005, American scientists Tettelin in the study of group B streptococcus, first put forward the concept of genome-wide, points out that there are significant differences between different strains of bacteria, the single strain of genetic information is not complete on behalf of the all the genetic information.
Bacteria genome-wide analysis can comprehensively to study the genetic diversity of bacteria, explore the evolution of the relationship between different individuals and to the discovery of important virulence factors, and the new vaccine design has important application value.


The generic genome can be divided into three parts, the core genes, the accessory genes and the specific genes.
The core genome is the gene that all strains share, and these genes are involved in basic biological processes, such as gene expression, energy generation, amino acid metabolism, etc.
Accessory genes refer to genes in some strains that are related to the diversity of species and give individuals a competitive advantage.
Specific gene exists only in a certain strains, these genes are generally through the level of gene transfer (HGT) and is often associated with special phenotypic characteristics of the strain, such as the specific environment of adaptability, or unique pathogenicity, etc.


Below to share an article recently published in the use of genome-wide studies of different populations of streptococcus pneumoniae genome-wide Heterogeneity of literature and core genome: Heterogeneity among estimates of the core genome and pan - genome in company's pneumococcal populations.

The authors studied 3121 pneumococcal bacteria from healthy individuals from four regions of the world (Reykjavi k, South ampton, Boston, Maela).


  • The core genomic comparison of data sets in 4 regions


Reykjavik (n = 1,059), Southampton (n = 1,052), Boston (n = 1,029), the number of core genomes in these three regions is similar, and the number of core genes in Maela is only 394.
But 4 core set COG in the genome function distribution similar: the highest proportion is unknown function gene (21.7 24.1%), and, in turn, is the translation, the ribosome structure and biosynthesis (11.9 15.7%), amino acid transport and metabolism (7.1 8.6%), transcription (6.7 7.9%).


  • Supercore genomes and essential genes


The super-core genome of 3121 pneumococcal bacteria contains 303 genes, and 461 genes are Shared by Reykjavik, Southampton and Boston.
Early studies found that 397 genes were necessary for survival, while only 127 were found in the supercore genome, which was mainly involved in cellular basic functions.


  • Super core genome system evolution tree construction


According to MLST, the evolution tree of 303 supercore genes was constructed.


  • Pangenome comparison


A pan-genomic comparison of pneumococcal populations in four regions, Reykjavik, Southampton, and Boston, had 7277-7425 genes in the pan-genome of three populations, while the pan-genome of Maela contained 15751 genes.
The pan-genome of four populations is open.
The number of specific genes in the three populations is also very similar, respectively, Reykjavik (n = 754), Southampton (n = 587) and Boston (n = 652), while the number of unique genes of the Maela population is 3668.


  • There are long sequences of genes in the Maela pangenome


In Maela3668 unique genes, some of the cells that may be derived from non-pneumoniae and Filofactor alocis ATCC35896 have been found to contain transposated units.


  • Conclusion


The core genome and the generic genome of different flora of the same species have high heterogeneity.
The promotion of data from a single population to a large population requires caution.







Please leave a message and We will get back to you in 12hrs.Thanks!