• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

Suggestion: Create an Italian Regional DNA Project similar to the Benelux & France one

Jackz1888

Regular Member
Messages
13
Reaction score
3
Points
3
Hello Maciamo and Eupedia team,


First, thank you for the amazing work on Eupedia—it’s one of the best resources for population genetics and history. I especially love the detailed Benelux & France DNA project page, which compiles scientific studies and commercial data (like from the French Heritage Project) to provide regional subclade estimates (e.g., for Z56/Z193 in Provence).


I was wondering if there are any plans to create a similar dedicated project/page for Italy? There are several Italian Y-DNA studies (Boattini et al., Grugni et al., etc.) and growing commercial data, but subclade resolution for regions like Lombardy, Piedmont, Tuscany, etc., is often limited compared to France.


An “Italy Regional DNA Project” would be incredibly useful for many users interested in Italian genetics, especially with deeper clades like U152 branches (Z56, Z193, L2, etc.). I’d be happy to help contribute data or links if needed!


What do you think? Would this be feasible?


Thanks again for all your efforts!


Best regards,


Giacomo Zampini
 
You mean like this?

1766593535567.jpeg
 
I didn't mention the regional breakdown of Z36 and the Z56 in Italy because I didn't have enough data.
 
The underlying problem is that P312* is a macro>macro>macro>haplogroup of 210 million.

L21 is a macrohaplogroup, 60 million.

ZZ11* is the meta-macro>haplogroup of 170 million.

DF27 is the macrohaplogroup, 120 million.

U152 is a macrohaplogroup of 50 million.

L2 is a macrohaplogroup, 30 million… (these are examples of current scientific estimates).

Z56 can easily have between 1 and 10 million, and therefore 10 million different terminal SNPs. You are trying to predict the future in order to understand the past.

Statistically, you start needing 10,000 samples, not per country, but per region, to classify all the P312> from between 3000–2400 BCE.

Z56* is very clearly Italic, with the largest number of branches, greatest age, and highest percentages.
IMG_4171.png


Its consolidation was entirely Etruscan.

Its epicenter of ethnogenesis is in present-day Tuscany; no aDNA is needed—there are millions of living proofs not yet classified.

That is a minimum of 4,500 uninterrupted years.

You go around in circles because you don’t understand the absolutisms of the Y chromosome.

aDNA usually represents dead lines; we don’t want to know that it existed, we want to know what we are, in order to understand what we are looking for. Without refinement from current populations, ancient SNPs cannot be read, and there is nothing with which to “see-read” the tree, nor when and how it branches. That is why the vast majority of dead samples can only be read with an approximate 500-year window for their “terminal” SNP.

The lack of coverage that archaeogeneticists constantly mention is a fallacy used to avoid explaining a deeper reality. The SNP sequence in aDNA is like a wooden ladder with broken rungs, so the first thing you need is to know the complete sequence of rungs to be able to triangulate it. That is why, coincidentally, only R1b approximates its space-time TMRCA, but if you find an E-V13 you are left 1,500 years away from the last documented SNP.

Coverage depends exclusively on the ISSOG tree used for reading, not on the aDNA sample itself. The same sample will not give the same result in 2020 as it does in 2025.

If a sample has no reads for P312* but does have ZZ11*>U152*>L2*, it is an L2*, but you remain at L2* even if it is the year 2000 BCE, because 20 rungs are missing that have not been found in sampled populations.


What you need to do is a crowdfunding campaign to analyze the aDNA of Italic Bell Beaker horses, and you’ll be pleasantly surprised not to be “space Yamnayas.”

Throughout Europe, it has not been found that steppe horses of the DB haplogroup went beyond the southern Danube. Western horses were all D*, P0*, DA1*, nothing to do with the steppe DB ponies, which were 140 cm at the withers.

The D* horse haplogroup was first detected in Romania in 3800 BCE.

East of that epicenter there were DB*, west DA1*, both with DOM2 autosomal admixture.

Genetic studies of horses are a complete mess, but this is what emerges when all the data are overlaid.

There was a group R1B-Z2103 and R1A-Z93 with D, P and DB.

The “Mycenans” and Egeos R1B-PF7562, R1B-Z2109, E-V13, J2, and T with D, DA1.

Atlantic-Mediterranean P312 with P0, D, and DA1, and they were the ones who expanded in the Iron Age the DAC, the haplogroup that today is carried by 80% of the breeds.
 
The underlying problem is that P312* is a macro>macro>macro>haplogroup of 210 million.

L21 is a macrohaplogroup, 60 million.

ZZ11* is the meta-macro>haplogroup of 170 million.

DF27 is the macrohaplogroup, 120 million.

U152 is a macrohaplogroup of 50 million.

L2 is a macrohaplogroup, 30 million… (these are examples of current scientific estimates).

Z56 can easily have between 1 and 10 million, and therefore 10 million different terminal SNPs. You are trying to predict the future in order to understand the past.

Statistically, you start needing 10,000 samples, not per country, but per region, to classify all the P312> from between 3000–2400 BCE.

Z56* is very clearly Italic, with the largest number of branches, greatest age, and highest percentages.
View attachment 19017

Its consolidation was entirely Etruscan.

Its epicenter of ethnogenesis is in present-day Tuscany; no aDNA is needed—there are millions of living proofs not yet classified.

That is a minimum of 4,500 uninterrupted years.

You go around in circles because you don’t understand the absolutisms of the Y chromosome.

aDNA usually represents dead lines; we don’t want to know that it existed, we want to know what we are, in order to understand what we are looking for. Without refinement from current populations, ancient SNPs cannot be read, and there is nothing with which to “see-read” the tree, nor when and how it branches. That is why the vast majority of dead samples can only be read with an approximate 500-year window for their “terminal” SNP.

The lack of coverage that archaeogeneticists constantly mention is a fallacy used to avoid explaining a deeper reality. The SNP sequence in aDNA is like a wooden ladder with broken rungs, so the first thing you need is to know the complete sequence of rungs to be able to triangulate it. That is why, coincidentally, only R1b approximates its space-time TMRCA, but if you find an E-V13 you are left 1,500 years away from the last documented SNP.

Coverage depends exclusively on the ISSOG tree used for reading, not on the aDNA sample itself. The same sample will not give the same result in 2020 as it does in 2025.

If a sample has no reads for P312* but does have ZZ11*>U152*>L2*, it is an L2*, but you remain at L2* even if it is the year 2000 BCE, because 20 rungs are missing that have not been found in sampled populations.


What you need to do is a crowdfunding campaign to analyze the aDNA of Italic Bell Beaker horses, and you’ll be pleasantly surprised not to be “space Yamnayas.”

Throughout Europe, it has not been found that steppe horses of the DB haplogroup went beyond the southern Danube. Western horses were all D*, P0*, DA1*, nothing to do with the steppe DB ponies, which were 140 cm at the withers.

The D* horse haplogroup was first detected in Romania in 3800 BCE.

East of that epicenter there were DB*, west DA1*, both with DOM2 autosomal admixture.

Genetic studies of horses are a complete mess, but this is what emerges when all the data are overlaid.

There was a group R1B-Z2103 and R1A-Z93 with D, P and DB.

The “Mycenans” and Egeos R1B-PF7562, R1B-Z2109, E-V13, J2, and T with D, DA1.

Atlantic-Mediterranean P312 with P0, D, and DA1, and they were the ones who expanded in the Iron Age the DAC, the haplogroup that today is carried by 80% of the breeds.
Where does the map you posted get the data from?
 
Where does the map you posted get the data from?
I don’t remember anymore, these maps always use data from scientific papers as a base and then adjust the clades with data from private DNA companies. They’re never precise, only approximate.
 

I don’t remember anymore, these maps always use data from scientific papers as a base and then adjust the clades with data from private DNA companies. They’re never precise, only approximate.
In your opinion, which is the best Z56 map? The one from Passa or the one you posted?
 
In your opinion, which is the best Z56 map? The one from Passa or the one you posted?

Passa's maps, assuming they are accurate, are likely outdated. Unfortunately, not many recent studies have been published on uniparental markers, especially those of the Y chromosome, for various reasons.
 
Passa's maps, assuming they are accurate, are likely outdated. Unfortunately, not many recent studies have been published on uniparental markers, especially those of the Y chromosome, for various reasons.
I don't think the Passa maps are very reliable, they show white areas in Piedmont, Liguria, northern Lombardy, and they also date back to 2016
 
In your opinion, which is the best Z56 map? The one from Passa or the one you posted?
When dealing with populations, it is more important to know where the data come from and their main context than the data themselves.

Broadly speaking, exactly the same thing happens as with vote counts in politics.

Scientific data are very precise because they follow methodologies based on population cores and other “coherent structures,” whereas commercial DNA companies are completely chaotic and disordered. That is why scientific data—which do not change for 10 years—are used as a baseline and are then complemented with the “refinement” of deep clades provided by the DNA companies.

The main structure of U152>Z56 is Z56>Z43>Z145, and the rest are its “cousins,” since this structure easily represents more than half of all Z56 (for years now), and that is very unlikely to change.

I’ll give you some proportionality examples based on FTDNA so that you can “see” this problem.

L51 has 270,000 samples.
U152 has 30,000 samples.
Z56 has only 2,800 samples.

Italy specifically has only 144 samples, so once you move past Z56>, you are left with 4 or 10 samples per clade.

That is what Maciamo means when he says there are not enough data—you are asking him to attempt a backward triple somersault.

The numbers drop very sharply as we approach bottlenecks; 2,800 samples for all of Z56 end up being very few.

That is why I point out that until studies start reaching 10,000 samples per region, no significant progress will be made at this stage. We are on a very steep slope, and over the last five years almost nothing has changed for this very reason.

The bias between countries is also enormous: northern countries have around 30,000 samples per country, while Italy, France, or Spain do not exceed 6,000.

How does this affect branching?

The largest documented clade to date is L21>DF13>Z58591, with 40 branches and 36,000 samples (2450 BCE).

U152>L2 has 33 branches with 20,000 samples (2650 BCE).

DF27>ZZ12_1 has 31 branches with 15,000 samples (2650 BCE).

If sampling were proportional and all of them had 36,000 samples:

ZZ12_1 would have 74.
L2 would have 59.
Z39589 would remain at 40.

Does that mean ZZ12_1 and L2 will end up having that many branches?

Probably not, but I mapped ZZ12_1 with 29 branches about six months ago and now it has 31. Everything can change, depending more on where sampling is done than on the sheer number of samples.

Therefore, the claim that 50% of northern Italics are U152 is reliable, and that Z56 shows peaks between 5–15% across Italy today is the most accurate estimate we have. To go further, you will only find more answers or clues in archaeological objects.

With Z56, it may end up being somewhat larger or smaller, but a 15% peak in northern population cores is already a very notable figure.


P.S.: This long explanation applies to all haplogroups and SNPs after 2500 BCE. Excavating the “biblical” tree will be much slower than we might estimate, because the more we are given… the more we want to know.
 
Make an account on Genetic Homeland and use their "Mapping" feature to see distribution of SNPs. It's a good tool.
 
Back
Top