• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

Admixtools admixtools2 TUTORIAL for WINDOWS.

I only added the outputformat parameter to the main 9 however I have always had it located between the 2nd fileset and the merged fileset – in other words on the 7th line. Also I didn't have any empty lines, so it was just 10 lines with no line spaces in between.
You could try that and if it fails again, try converting your 23andMe file to Plink again but add the --keep-allele-order flag and redo the rest and see if that helps.

Other than that I'm running out of ideas. Some of these issues can be a needle in a haystack. Originally I couldn't get Eigensoft to compile on my Mac until I found a post where someone showed the code that needed editing.

If all fails you might have to convertf the AADR to Plink and then merge in Plink. I found the v62.0 to be more stable than previous releases so I was able to merge in Plink quite okay however a longer procedure.
Thank bro, how is the command --keep-allele-order? Where do I put it? (position)

About the mergeit file, you have it like this?:

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

I didn't know you were using Mac; but agree, I haven't seen much info for macOS, I bet it was harder for you to do all this.

Uff bro I tried to convert the V62.0 AADR to Plink like I wrote it in the post #297, I didn't get "killed", core dumped or my virtual machine crashed but it had been 4 hours and it kept processing, it was stuck...so I wasn't sure if there was an error by then:

Code:
jalisciense@vbox:~/bin> convertf -p par.EIGENSTRAT.PED
parameter file: par.EIGENSTRAT.PED
genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDPED
genotypeoutname: v62_1240k.bed
snpoutname: v62_1240k.bim
indivoutname: v62_1240k.fam
## convertf version: 8600
read 1073741824 bytes
read 2147483648 bytes
read 3221225472 bytes
read 4294967296 bytes
read 5368709120 bytes
read 5435121304 bytes
packed geno read OK
end of inpack
before compress: snps: 1233013 indivs: 17629
after compress: snps: 1233013 indivs: 17629


Btw I was thinking the "killed" message I receive when I try to merge the files, could be that I don't have enough RAM, maybe trying install a previous version of AADR would work? (Not v54 but like v52 or v44 for example) Or what do you think?
 
Last edited:
Thank bro, how is the command --keep-allele-order? Where do I put it? (position)

About the mergeit file, you have it like this?:

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind

I didn't know you were using Mac; but agree, I haven't seen much info for macOS, I bet it was harder for you to do all this.

Uff bro I tried to convert the V62.0 AADR to Plink like I wrote it in the post #297, I didn't get "killed", core dumped or my virtual machine crashed but it had been 4 hours and it kept processing, it was stuck...so I wasn't sure if there was an error by then:

Code:
jalisciense@vbox:~/bin> convertf -p par.EIGENSTRAT.PED
parameter file: par.EIGENSTRAT.PED
genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDPED
genotypeoutname: v62_1240k.bed
snpoutname: v62_1240k.bim
indivoutname: v62_1240k.fam
## convertf version: 8600
read 1073741824 bytes
read 2147483648 bytes
read 3221225472 bytes
read 4294967296 bytes
read 5368709120 bytes
read 5435121304 bytes
packed geno read OK
end of inpack
before compress: snps: 1233013 indivs: 17629
after compress: snps: 1233013 indivs: 17629


Btw I was thinking the "killed" message I receive when I try to merge the files, could be that I don't have enough RAM, maybe trying install a previous version of AADR would work? (Not v54 but like v52 or v44 for example) Or what do you think?
I googled the “Killed” message in Linux and yeah it seems it’s likely a case of out of memory. You might be able to tweak some settings on your computer to utilise the memory more efficiently.
A couple of options. Either try merging with the v62.0_HO set which is a smaller file; or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output.
Then use mergeit to merge your file with the subset.
 
I googled the “Killed” message in Linux and yeah it seems it’s likely a case of out of memory. You might be able to tweak some settings on your computer to utilise the memory more efficiently.
A couple of options. Either try merging with the v62.0_HO set which is a smaller file; or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output.
Then use mergeit to merge your file with the subset.
I used the max I could provide to my virtual machine before it crashes (6451 MB of RAM out of 8000 MB and 6 processors out of 8)

I always thought the HO set is heavier because there were more samples than the v normal ones lol

BtW is it necessary to download the AADR.anno file? or I can just download the AADR.snp, AADR.geno and AADR.ind?

About this option:

"or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output."

Could you provide me with an example pls? I understand better if I see it :)
 
Last edited:
I used the max I could provide to my virtual machine before it crashes (6451 MB of RAM out of 8000 MB and 6 processors out of 8)

I always thought the HO set is heavier because there were more samples than the v normal ones lol

BtW is it necessary to download the AADR.anno file? or I can just download the AADR.snp, AADR.geno and AADR.ind?

About this option:

"or use convertf to subset the 1240k dataset to the populations you want. Just add this parameter to the convertf parfile:
poplistname: yourfile.txt
yourfile.txt should consist of a list of populations (1/line) where populations are the labels in the last column of the input .ind file. Only the samples with listed labels will be output."

Could you provide me with an example pls? I understand better if I see it :)
.anno file is just information on the samples. It's now in .xlsx format.

Here's an example of a convertf par file for subsetting:

genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genotypeoutname: v62.0_1240k_public_subset.geno
snpoutname: v62.0_1240k_public_subset.snp
indivoutname: v62.0_1240k_public_subset.ind
poplistname: poplist_keep.txt

Here is what's listed in the poplist file:

Denmark_Zealand_Medieval_Saxon.AG
England_LIA.AG
France_GrandEst_IA1.SG
France_GrandEst_IA2.SG
France_HautsDeFrance_IA2.SG
France_Occitanie_IA2.SG
France_SouthEast_IA2.AG
Germany_EarlyMedieval_Saxon.AG
Germany_Drantum_Medieval_Saxon.AG
Germany_Dunum_Medieval_Saxon.AG
Germany_Issendorf_EarlyMedieval_Saxon.AG
Germany_Liebenau_EarlyMedieval_Saxon.AG
Germany_Schortens_EarlyMedieval_Saxon.AG
Ireland_Kilteasheen_EarlyMedieval_AngloSaxon_Norman.AG
Ireland_Viking.SG
Norway_Viking.SG
Scotland_Viking.SG
 
.anno file is just information on the samples. It's now in .xlsx format.

Here's an example of a convertf par file for subsetting:

genotypename: v62.0_1240k_public.geno
snpname: v62.0_1240k_public.snp
indivname: v62.0_1240k_public.ind
outputformat: PACKEDANCESTRYMAP
genotypeoutname: v62.0_1240k_public_subset.geno
snpoutname: v62.0_1240k_public_subset.snp
indivoutname: v62.0_1240k_public_subset.ind
poplistname: poplist_keep.txt

Here is what's listed in the poplist file:

Denmark_Zealand_Medieval_Saxon.AG
England_LIA.AG
France_GrandEst_IA1.SG
France_GrandEst_IA2.SG
France_HautsDeFrance_IA2.SG
France_Occitanie_IA2.SG
France_SouthEast_IA2.AG
Germany_EarlyMedieval_Saxon.AG
Germany_Drantum_Medieval_Saxon.AG
Germany_Dunum_Medieval_Saxon.AG
Germany_Issendorf_EarlyMedieval_Saxon.AG
Germany_Liebenau_EarlyMedieval_Saxon.AG
Germany_Schortens_EarlyMedieval_Saxon.AG
Ireland_Kilteasheen_EarlyMedieval_AngloSaxon_Norman.AG
Ireland_Viking.SG
Norway_Viking.SG
Scotland_Viking.SG
Thanks bro, you are very nice.

But I finally managed to merge my data to AADR!!! I download the v62.0_HO files and using the max of RAM and processors to my virtual machine, this time the system was able to "stayed alive" and not be "killed" lol

fa3TeJq.jpg



So thank you so much for your help bro, you were very patient with me, so without you I would not be able to modelling myself in qpAdm!
 
Great to hear. Yeah the HO set has more samples but less snps. If you ever want to do an analysis with the higher quality samples from the 1240k set then you can use the subset option in convertf.
 
Great to hear. Yeah the HO set has more samples but less snps. If you ever want to do an analysis with the higher quality samples from the 1240k set then you can use the subset option in convertf.
I used individuals that don't end in .HO but .DG, .SG, .SG, etc...so the samples I used have the same snp that the sames in 1240k? Just the ones that end in .HO have 500k of snp?

Thanks, so when I merged them I have to add the parameter poplistname: poplist_keep.txt again in the merge_param.par file too like this?

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
poplistname: poplist_keep.txt
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind
 
I used individuals that don't end in .HO but .DG, .SG, .SG, etc...so the samples I used have the same snp that the sames in 1240k? Just the ones that end in .HO have 500k of snp?

Thanks, so when I merged them I have to add the parameter poplistname: poplist_keep.txt again in the merge_param.par file too like this?

geno1: mydata.geno
snp1: mydata.snp
ind1: mydata.ind
geno2: v62.0_1240k_public.geno
snp2: v62.0_1240k_public.snp
ind2: v62.0_1240k_public.ind
poplistname: poplist_keep.txt
outputformat: PACKEDANCESTRYMAP
genooutfilename: mydata_merged_with_1240k.geno
snpoutfilename: mydata_merged_with_1240k.snp
indoutfilename: mydata_merged_with_1240k.ind
No, only in the convertf par file. Then you have to merge your individual file with the AADR subset file, not with the full AADR.

Although the ancient samples are the same labelling in both datasets, the ones in the HO set have been merged with the Human Origins array which has resulted in only half the amount of snps for those samples.
 
Last edited:
No, only in the convertf par file. Then you have to merge your individual file with the AADR subset file, not with the full AADR.

Although the ancient samples are the same labelling in both datasets, the ones in the HO set have been merged with the Human Origins array which has resulted in only half the amount of snps for those samples.
I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

7bGQl9T.jpg


When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

oefqNbV.jpg


2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?
 
Last edited:
I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

7bGQl9T.jpg


When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

oefqNbV.jpg


2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?
That error you get is a Linux basic, its because the spaces, instead of "John Smith" try "John_Smith" or "JohnSmith" and just modify the outer column, the "control" one
 
That error you get is a Linux basic, its because the spaces, instead of "John Smith" try "John_Smith" or "JohnSmith" and just modify the outer column, the "control" one
It's working now, thanks bro.
 
I see, well, I would see when I do it, but it is more clear now, and ok, if I can I would test with the highest quality samples 1240k.

Just a few questions if you do not mind:

1. How I change my name and why is "control"?

7bGQl9T.jpg


When I try to change it to my real name (I will just call Smith John as example) like this:

Smith : John M John

I got this error:

oefqNbV.jpg


2. I want to merge my family in the same merged v62 HO dataset file (We will be all together there) do you recommend it? Or I make a diferent merged v62 HO file for each one?

3. If the ones I want to merge are females, I have to change any number in her .fam file?

Lopez Sofia 0 0 1 -9

Or I have to change any parameter in another file?
You can merge them all into the same v62_HO set. For the females you can change the 3rd number (sex) manually to 2 (female).
If you want you can set yourself and your other family members to 'case' instead of 'control'. The equivalent in Plink is the 4th number: 2 for case, 1 for control; 'case' is equivalent to 'target' and 'control' is equivalent to 'reference'. However the settings of these columns won't affect most of what you'll be doing.
 
You can merge them all into the same v62_HO set. For the females you can change the 3rd number (sex) manually to 2 (female).
If you want you can set yourself and your other family members to 'case' instead of 'control'. The equivalent in Plink is the 4th number: 2 for case, 1 for control; 'case' is equivalent to 'target' and 'control' is equivalent to 'reference'. However the settings of these columns won't affect most of what you'll be doing.
Third number...so it would be like this?:

Lopez Sofia 0 0 2 1

The -9 will be always changed by 1 then?

Isee, it's good to know about control and target then.
 
Last edited:
Third number...so it would be like this?:

Lopez Sofia 0 0 2 1

The -9 will be always changed by 1 then?

Isee, it's good to know about control and target then.
0 0 2 2 as I suggest putting yourself and family as ‘case’.
But it doesn’t affect most analysis. -9 is undefined, 1 is control. When using convertf you must change all the -9 to either 1 or 2.
 
0 0 2 2 as I suggest putting yourself and family as ‘case’.
But it doesn’t affect most analysis. -9 is undefined, 1 is control. When using convertf you must change all the -9 to either 1 or 2.
What kind of analysis can affect and which ones cannot affect if I didn't change the 1 to 2? I just finished of merging my dad's, grandpa's and a friend's raw datas, so I'd like to know.
 
Last edited:
What kind of analysis can affect and which ones cannot affect if I didn't change the 1 to 2? I just finished of merging my dad's, grandpa's and a friend's raw datas, so I'd like to know.
I don't think any of the tasks that you would do in Admixtools would be affected. If so it would be mentioned in the manual.

However because I often use ADMIXTURE I do all the QC work in Plink. So if I filter for hwe, the filter will by default omit case individuals.
 
I don't think any of the tasks that you would do in Admixtools would be affected. If so it would be mentioned in the manual.

However because I often use ADMIXTURE I do all the QC work in Plink. So if I filter for hwe, the filter will by default omit case individuals.
I see, well, I usually wouldn't mind if I have to do it again, the problem is that I need more RAM each time, so I'm afraid that I couldn't merge any other raw data of my family because of that, so I prefer merge the ones that I haven't done so in case of.

I asked to chatgpt and it told me that writing 1 instead 2 would affect the genetic variations and phenotypes stuffs, but it doesn't affect for qpAdm and PCA, so do you know if that is true?
 
Last edited:
I see, well, I usually don't care if I have to do it again, the problem is that I need more RAM each time, so I'm afraid that I couldn't merge any other raw data of my family because of that, so I prefer merge the ones that I haven't done so in case of.

I asked to chatgpt and it told me that writing 1 instead 2 would affect the genetic variations and phenotypes stuffs, but it doesn't affect for qpAdm and PCA, so do you know if that is true?
You can change the .fam or .ind file anytime before or after merging. Merging doesn't set the case/control selection into concrete so you don't need to remerge again if you change your mind.

When you convertf your Plink files to PAM it adds your family ID to the front of your individual ID with a colon; and then puts 'case' or 'control' in the 3rd column (instead of your family ID) depending on whether you had the 6th column in Plink format set to 2 or 1. So I recommend conforming to the PAM/Eigenstrat format and getting rid of case/control altogether and replacing it with your family ID (and deleting your family ID from the front of the individual ID).

So when converted to PAM/Eigenstrat your individual sample row might look like this:

Smith:John M Control

So I would change it to:

John M Smith

That way you won't have to worry about case/control at all since none of the samples in the AADR have case/control settings. Btw the 'M' is the sex column, not a middle name initial.
 
Back
Top