Admixtools admixtools2 TUTORIAL for WINDOWS.

Jalisciense · Nov 27, 2024

Celtion said:
I only added the outputformat parameter to the main 9 however I have always had it located between the 2nd fileset and the merged fileset – in other words on the 7th line. Also I didn't have any empty lines, so it was just 10 lines with no line spaces in between.
You could try that and if it fails again, try converting your 23andMe file to Plink again but add the --keep-allele-order flag and redo the rest and see if that helps.

Other than that I'm running out of ideas. Some of these issues can be a needle in a haystack. Originally I couldn't get Eigensoft to compile on my Mac until I found a post where someone showed the code that needed editing.

If all fails you might have to convertf the AADR to Plink and then merge in Plink. I found the v62.0 to be more stable than previous releases so I was able to merge in Plink quite okay however a longer procedure.

Thank bro, how is the command --keep-allele-order? Where do I put it? (position)

About the mergeit file, you have it like this?:

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

I didn't know you were using Mac; but agree, I haven't seen much info for macOS, I bet it was harder for you to do all this.

Uff bro I tried to convert the V62.0 AADR to Plink like I wrote it in the post #297, I didn't get "killed", core dumped or my virtual machine crashed but it had been 4 hours and it kept processing, it was stuck...so I wasn't sure if there was an error by then:

Code:

jalisciense@vbox:~/bin> convertf -p par.EIGENSTRAT.PED
parameter file: par.EIGENSTRAT.PED
genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDPED
genotypeoutname: v62_1240k.bed
snpoutname: v62_1240k.bim
indivoutname: v62_1240k.fam
## convertf version: 8600
read 1073741824 bytes
read 2147483648 bytes
read 3221225472 bytes
read 4294967296 bytes
read 5368709120 bytes
read 5435121304 bytes
packed geno read OK
end of inpack
before compress: snps: 1233013 indivs: 17629
after compress: snps: 1233013 indivs: 17629

Btw I was thinking the "killed" message I receive when I try to merge the files, could be that I don't have enough RAM, maybe trying install a previous version of AADR would work? (Not v54 but like v52 or v44 for example) Or what do you think?

Jalisciense · Nov 27, 2024

Jovialis said:
Now THAT's what I call a PCA:

https://twitter.com/x/status/1861801542958911684

It seems pretty good bro! I really like how it seems, congrats!

Celtion · Nov 28, 2024

Jalisciense said:
Thank bro, how is the command --keep-allele-order? Where do I put it? (position)

About the mergeit file, you have it like this?:

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

I didn't know you were using Mac; but agree, I haven't seen much info for macOS, I bet it was harder for you to do all this.

Uff bro I tried to convert the V62.0 AADR to Plink like I wrote it in the post #297, I didn't get "killed", core dumped or my virtual machine crashed but it had been 4 hours and it kept processing, it was stuck...so I wasn't sure if there was an error by then:

Code:

jalisciense@vbox:~/bin> convertf -p par.EIGENSTRAT.PED parameter file: par.EIGENSTRAT.PED genotypename: v62.0_1240k_public.geno snpname: v62.0_1240k_public.snp indivname: v62.0_1240k_public.ind outputformat: PACKEDPED genotypeoutname: v62_1240k.bed snpoutname: v62_1240k.bim indivoutname: v62_1240k.fam ## convertf version: 8600 read 1073741824 bytes read 2147483648 bytes read 3221225472 bytes read 4294967296 bytes read 5368709120 bytes read 5435121304 bytes packed geno read OK end of inpack before compress: snps: 1233013 indivs: 17629 after compress: snps: 1233013 indivs: 17629

Btw I was thinking the "killed" message I receive when I try to merge the files, could be that I don't have enough RAM, maybe trying install a previous version of AADR would work? (Not v54 but like v52 or v44 for example) Or what do you think?

I googled the “Killed” message in Linux and yeah it seems it’s likely a case of out of memory. You might be able to tweak some settings on your computer to utilise the memory more efficiently.
A couple of options. Either try merging with the v62.0_HO set which is a smaller file; or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output.
Then use mergeit to merge your file with the subset.

Jalisciense · Nov 28, 2024

Celtion said:
I googled the “Killed” message in Linux and yeah it seems it’s likely a case of out of memory. You might be able to tweak some settings on your computer to utilise the memory more efficiently.
A couple of options. Either try merging with the v62.0_HO set which is a smaller file; or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output.
Then use mergeit to merge your file with the subset.

I used the max I could provide to my virtual machine before it crashes (6451 MB of RAM out of 8000 MB and 6 processors out of 8)

I always thought the HO set is heavier because there were more samples than the v normal ones lol

BtW is it necessary to download the AADR.anno file? or I can just download the AADR.snp, AADR.geno and AADR.ind?

About this option:

"or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output."

Could you provide me with an example pls? I understand better if I see it

Celtion · Nov 28, 2024

Jalisciense said:
I used the max I could provide to my virtual machine before it crashes (6451 MB of RAM out of 8000 MB and 6 processors out of 8)

I always thought the HO set is heavier because there were more samples than the v normal ones lol

BtW is it necessary to download the AADR.anno file? or I can just download the AADR.snp, AADR.geno and AADR.ind?

About this option:

"or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output."

Could you provide me with an example pls? I understand better if I see it

.anno file is just information on the samples. It's now in .xlsx format.

Here's an example of a convertf par file for subsetting:

genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genotypeoutname: v62.0_1240k_public_subset.geno
snpoutname: v62.0_1240k_public_subset.snp
indivoutname: v62.0_1240k_public_subset.ind
poplistname: poplist_keep.txt

Here is what's listed in the poplist file:

Denmark_Zealand_Medieval_Saxon.AG
England_LIA.AG
France_GrandEst_IA1.SG
France_GrandEst_IA2.SG
France_HautsDeFrance_IA2.SG
France_Occitanie_IA2.SG
France_SouthEast_IA2.AG
Germany_EarlyMedieval_Saxon.AG
Germany_Drantum_Medieval_Saxon.AG
Germany_Dunum_Medieval_Saxon.AG
Germany_Issendorf_EarlyMedieval_Saxon.AG
Germany_Liebenau_EarlyMedieval_Saxon.AG
Germany_Schortens_EarlyMedieval_Saxon.AG
Ireland_Kilteasheen_EarlyMedieval_AngloSaxon_Norman.AG
Ireland_Viking.SG
Norway_Viking.SG
Scotland_Viking.SG

Jalisciense · Nov 28, 2024

Celtion said:
.anno file is just information on the samples. It's now in .xlsx format.

Here's an example of a convertf par file for subsetting:

genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genotypeoutname: v62.0_1240k_public_subset.geno
snpoutname: v62.0_1240k_public_subset.snp
indivoutname: v62.0_1240k_public_subset.ind
poplistname: poplist_keep.txt

Here is what's listed in the poplist file:

Denmark_Zealand_Medieval_Saxon.AG
England_LIA.AG
France_GrandEst_IA1.SG
France_GrandEst_IA2.SG
France_HautsDeFrance_IA2.SG
France_Occitanie_IA2.SG
France_SouthEast_IA2.AG
Germany_EarlyMedieval_Saxon.AG
Germany_Drantum_Medieval_Saxon.AG
Germany_Dunum_Medieval_Saxon.AG
Germany_Issendorf_EarlyMedieval_Saxon.AG
Germany_Liebenau_EarlyMedieval_Saxon.AG
Germany_Schortens_EarlyMedieval_Saxon.AG
Ireland_Kilteasheen_EarlyMedieval_AngloSaxon_Norman.AG
Ireland_Viking.SG
Norway_Viking.SG
Scotland_Viking.SG

Thanks bro, you are very nice.

But I finally managed to merge my data to AADR!!! I download the v62.0_HO files and using the max of RAM and processors to my virtual machine, this time the system was able to "stayed alive" and not be "killed" lol

So thank you so much for your help bro, you were very patient with me, so without you I would not be able to modelling myself in qpAdm!

Celtion · Nov 28, 2024

Great to hear. Yeah the HO set has more samples but less snps. If you ever want to do an analysis with the higher quality samples from the 1240k set then you can use the subset option in convertf.

Jalisciense · Nov 28, 2024

Celtion said:
Great to hear. Yeah the HO set has more samples but less snps. If you ever want to do an analysis with the higher quality samples from the 1240k set then you can use the subset option in convertf.

I used individuals that don't end in .HO but .DG, .SG, .SG, etc...so the samples I used have the same snp that the sames in 1240k? Just the ones that end in .HO have 500k of snp?

Thanks, so when I merged them I have to add the parameter poplistname: poplist_keep.txt again in the merge_param.par file too like this?

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
poplistname: poplist_keep.txt
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

Celtion · Nov 28, 2024

Jalisciense said:
I used individuals that don't end in .HO but .DG, .SG, .SG, etc...so the samples I used have the same snp that the sames in 1240k? Just the ones that end in .HO have 500k of snp?

Thanks, so when I merged them I have to add the parameter poplistname: poplist_keep.txt again in the merge_param.par file too like this?

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
poplistname: poplist_keep.txt
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

No, only in the convertf par file. Then you have to merge your individual file with the AADR subset file, not with the full AADR.

Although the ancient samples are the same labelling in both datasets, the ones in the HO set have been merged with the Human Origins array which has resulted in only half the amount of snps for those samples.

Jalisciense · Nov 28, 2024

Celtion said:
No, only in the convertf par file. Then you have to merge your individual file with the AADR subset file, not with the full AADR.

Although the ancient samples are the same labelling in both datasets, the ones in the HO set have been merged with the Human Origins array which has resulted in only half the amount of snps for those samples.

I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?

baeticvs · Nov 28, 2024

Jalisciense said:
I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?

That error you get is a Linux basic, its because the spaces, instead of "John Smith" try "John_Smith" or "JohnSmith" and just modify the outer column, the "control" one

Jalisciense · Nov 28, 2024

baeticvs said:
That error you get is a Linux basic, its because the spaces, instead of "John Smith" try "John_Smith" or "JohnSmith" and just modify the outer column, the "control" one

It's working now, thanks bro.

Jovialis · Nov 29, 2024

https://twitter.com/x/status/1862362292957925665

Celtion · Nov 29, 2024

Jalisciense said:
I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?

You can merge them all into the same v62_HO set. For the females you can change the 3rd number (sex) manually to 2 (female).
If you want you can set yourself and your other family members to 'case' instead of 'control'. The equivalent in Plink is the 4th number: 2 for case, 1 for control; 'case' is equivalent to 'target' and 'control' is equivalent to 'reference'. However the settings of these columns won't affect most of what you'll be doing.

Jalisciense · Nov 29, 2024

Celtion said:
You can merge them all into the same v62_HO set. For the females you can change the 3rd number (sex) manually to 2 (female).
If you want you can set yourself and your other family members to 'case' instead of 'control'. The equivalent in Plink is the 4th number: 2 for case, 1 for control; 'case' is equivalent to 'target' and 'control' is equivalent to 'reference'. However the settings of these columns won't affect most of what you'll be doing.

Third number...so it would be like this?:

Lopez Sofia 0 0 2 1

The -9 will be always changed by 1 then?

Isee, it's good to know about control and target then.

Celtion · Nov 29, 2024

Jalisciense said:
Third number...so it would be like this?:

Lopez Sofia 0 0 2 1

The -9 will be always changed by 1 then?

Isee, it's good to know about control and target then.

0 0 2 2 as I suggest putting yourself and family as ‘case’.
But it doesn’t affect most analysis. -9 is undefined, 1 is control. When using convertf you must change all the -9 to either 1 or 2.

Jalisciense · Nov 29, 2024

Celtion said:
0 0 2 2 as I suggest putting yourself and family as ‘case’.
But it doesn’t affect most analysis. -9 is undefined, 1 is control. When using convertf you must change all the -9 to either 1 or 2.

What kind of analysis can affect and which ones cannot affect if I didn't change the 1 to 2? I just finished of merging my dad's, grandpa's and a friend's raw datas, so I'd like to know.

Celtion · Nov 29, 2024

Jalisciense said:
What kind of analysis can affect and which ones cannot affect if I didn't change the 1 to 2? I just finished of merging my dad's, grandpa's and a friend's raw datas, so I'd like to know.

I don't think any of the tasks that you would do in Admixtools would be affected. If so it would be mentioned in the manual.

However because I often use ADMIXTURE I do all the QC work in Plink. So if I filter for hwe, the filter will by default omit case individuals.

Jalisciense · Nov 30, 2024

Celtion said:
I don't think any of the tasks that you would do in Admixtools would be affected. If so it would be mentioned in the manual.

However because I often use ADMIXTURE I do all the QC work in Plink. So if I filter for hwe, the filter will by default omit case individuals.

I see, well, I usually wouldn't mind if I have to do it again, the problem is that I need more RAM each time, so I'm afraid that I couldn't merge any other raw data of my family because of that, so I prefer merge the ones that I haven't done so in case of.

I asked to chatgpt and it told me that writing 1 instead 2 would affect the genetic variations and phenotypes stuffs, but it doesn't affect for qpAdm and PCA, so do you know if that is true?

Celtion · Nov 30, 2024

Jalisciense said:
I see, well, I usually don't care if I have to do it again, the problem is that I need more RAM each time, so I'm afraid that I couldn't merge any other raw data of my family because of that, so I prefer merge the ones that I haven't done so in case of.

I asked to chatgpt and it told me that writing 1 instead 2 would affect the genetic variations and phenotypes stuffs, but it doesn't affect for qpAdm and PCA, so do you know if that is true?

You can change the .fam or .ind file anytime before or after merging. Merging doesn't set the case/control selection into concrete so you don't need to remerge again if you change your mind.

When you convertf your Plink files to PAM it adds your family ID to the front of your individual ID with a colon; and then puts 'case' or 'control' in the 3rd column (instead of your family ID) depending on whether you had the 6th column in Plink format set to 2 or 1. So I recommend conforming to the PAM/Eigenstrat format and getting rid of case/control altogether and replacing it with your family ID (and deleting your family ID from the front of the individual ID).

So when converted to PAM/Eigenstrat your individual sample row might look like this:

Smith:John M Control

So I would change it to:

John M Smith

That way you won't have to worry about case/control at all since none of the samples in the AADR have case/control settings. Btw the 'M' is the sex column, not a middle name initial.

Admixtools admixtools2 TUTORIAL for WINDOWS.

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Advisor

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member