• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

Genetic study EGP1K: Whole-Genome Sequencing of 1,024 Egyptians Characterises Population Structure and Genetic Diversity

Tautalus

Regular Member
Messages
545
Reaction score
1,380
Points
93
Ethnic group
Portuguese
Y-DNA haplogroup
I2-M223 / I-FTB15368
mtDNA haplogroup
H6a1b2y
Abstract

Middle Eastern and North African populations remain underrepresented in genomic databases, comprising less than 1% of genome-wide association study participants despite representing approximately 6% of the global population. Here we present the Egypt Genome Project (EGP1K), in which we performed whole-genome sequencing on 1,024 unrelated Egyptian individuals originating from 21 of Egypt's 27 governorates, recruited through eight clinical and research centers across Upper and Lower Egypt. We identified over 51.3 million variants, of which 17.1 million (33.4%) were absent from dbSNP. Allele frequency comparisons across 6.5 million shared variants showed the strongest concordance with Middle Eastern populations (r = 0.977). Principal component analysis and ADMIXTURE modeling at K = 7 revealed that Egyptians share a dominant ancestry component (71.8%) with Middle Eastern populations and carry a smaller Egyptian-enriched component (18.5%) that distinguishes them from neighboring groups. Runs of homozygosity varied substantially across subregions, with Upper Egypt showing the highest burden, paralleling elevated consanguinity rates. Carrier frequency analysis identified MEFV (Familial Mediterranean Fever) at 9.1% as the most prevalent pathogenic carrier state; when adjusted for the national consanguinity rate, MEFV carrier status alone projects approximately 6,600 affected births per year. HLA class I typing identified allele frequencies placing Egyptians within the Levantine-Eastern Mediterranean cluster, providing baseline immunogenetic data currently absent from international databases. Analysis of polygenic risk score distributions revealed substantial differences in threshold-based risk stratification between Egyptians and European reference populations. When the European-derived 90th percentile threshold was applied, 83.3% of Egyptians were assigned to high-risk strata for stroke, 76.4% for chronic kidney disease, and 72.8% for gout, compared to the intended 10% high-risk proportion. These distributional shifts were observed across several cardiometabolic traits (Cohen's d = 1.55-1.61), while other traits showed closer cross-population concordance, indicating that the degree of threshold miscalibration varies by trait. Together, these findings establish EGP1K as a genomic reference for Egypt and indicate that European-derived risk stratification thresholds may not be directly transferable to the Egyptian population, supporting the need for population-specific calibration of polygenic risk scores.

PCA
jYZ7DFW.png


ADMIXTURE analysis
I expected a greater relevance from a Northeast African component, but like they say in the paper :
"The relatively low correlation with African reference populations, despite Egypt’s geographic position on the continent, arises from a known limitation of current reference panels: all available African references are sub-Saharan. Future comparisons against the Genome Aggregation Database (gnomAD) (Karczewski et al. 2020), which includes Middle Eastern samples, and the inclusion of North and Northeast African populations (e.g., Sudanese, Ethiopian, Somali) would provide a more complete picture of Egyptian genetic relationships within Africa. "
qJFJdw6.png


Genetic distance from Egyptians to 37 global reference populations.
These distances are bizarre, but we have to consider that this is geometric PCA-based Euclidean distances. It measures points on a map. Distances between population centers (centroids) in a PC1–PC20 space, according to the paper. Its a geometric distance in a space based on 20 axes of variation, not a direct measure of ancestry. Its not a "non-Euclidean” distance based on allele frequencies, drift or population history.
d7MaPmZ.png


Uniparental haplogroup distributions across populations.

KkE0rc9.png
 
These methods seem rather outdated, and at the very least, grossly simplistic. The distances shown here appear to be merely generic projections in a PCA, likely also due to the use of limited reference datasets (such as the 1K genomes).

Furthermore, the high-level subdivisions of Y-DNA and mtDNA haplogroups provided are not particularly informative. Reporting only the parent clades (E, J, T, R, etc.), which formed tens of thousands of years ago, offers very little insight into recent history. Without identifying specific deep SNPs or terminal subclades, the necessary granularity to understand the actual migratory paths and the complex layering of the Egyptian genetic landscape is lost. Essentially, it's a low-resolution map that misses the fine-scale details of the last 6,000 years.
 
Even if some of its analytical approaches are relatively simple, the paper helps fill a gap in global genomics. Populations from North Africa, including Egyptians, have long been under-represented in datasets such as the 1000 Genomes Project. By sequencing over a thousand individuals, the study provides a substantial modern genetic reference for Egypt, which is valuable regardless of interpretation.​
 
Are Egyptians essentially Arabian like with additional North African and tiny bit of sub Saharan African? I’ve seen Coptics score a ton of Natufian almost up to Saudi ranges and they’re supposedly the best representations of ancient Egyptians as they didn’t mix much outside their group. They even have a bit of Anatolian farmer in them, abt 20 pct, again abt as much as Saudis. The rest is smaller amounts of Iran neo and sub Saharan. Very interesting group!
 
Are Egyptians essentially Arabian like with additional North African and tiny bit of sub Saharan African? I’ve seen Coptics score a ton of Natufian almost up to Saudi ranges and they’re supposedly the best representations of ancient Egyptians as they didn’t mix much outside their group. They even have a bit of Anatolian farmer in them, abt 20 pct, again abt as much as Saudis. The rest is smaller amounts of Iran neo and sub Saharan. Very interesting group!

'Arab' is not a meaningful genetic term. Egyptians likely reflect their geographical location, like anyone else; they are, of course, the North African population closest to the Levant, and this may also be reflected in their genome.

It’s interesting that they’ve included these 1,000 Egyptian samples, especially as they’re being published for the first time, but we need studies using methods that are a little less outdated and simplistic in order to draw conclusions. Not least because I doubt there isn’t any internal variation amongst Egyptians.
 
Back
Top