Admixtools admixtools2 TUTORIAL for WINDOWS.

Jalisciense · Oct 26, 2024

It was already able to run/work thanks to a friend' help.

So I'm officially an qpAdm user now lol

**Important! How we solved it! Read it below!**

If someone is in the same situation that I was, then just give it more RAM to your Virtual Machine, the 5187 MB I was giving it was insufficient and the maximum I gave 7051 MB was too much and my laptop crashed, so you need to calibrate it until it works and for me was 6000 MB of RAM.

My laptop has 8000 MB of RAM (8 GB).

Celtion · Oct 27, 2024

Jalisciense said:
And when I am using the Terminal Super User Mode:

Idk what is wrong...

Okay I just realised you are using (the original) AdmixTools. Those here are using AdmixTools2 which is a more streamlined version. AT2 also runs in R instead of shell. All the info you need to install and use it is here: https://uqrmaie1.github.io/admixtools/index.html

Celtion · Oct 27, 2024

Jalisciense said:
It was already able to run/work thanks to a friend' help.

So I'm officially an qpAdm user now lol

**Important! How we solved it! Read it below!**

If someone is in the same situation that I was, then just give it more RAM to your Virtual Machine, the 5187 MB I was giving it was insufficient and the maximum I gave 7051 MB was too much and my laptop crashed, so you need to calibrate it until it works and for me was 6000 MB of RAM.

My laptop has 8000 MB of RAM (8 GB).

Glad you got it sorted out. If you find it too slow to run you might want to move over to AdmixTools2.

Jalisciense · Oct 27, 2024

Celtion said:
Glad you got it sorted out. If you find it too slow to run you might want to move over to AdmixTools2.

Thanks bro, well, AdmixTools1 was a recommendation of my friend; but yeah, I'd think about download AdmixTools2, I'm just tired right now to install qpAdm and make it work again after days of trying lol

And thanks for the link you provided me above.

Jalisciense · Oct 27, 2024

Celtion said:
Okay I just realised you are using (the original) AdmixTools. Those here are using AdmixTools2 which is a more streamlined version. AT2 also runs in R instead of shell. All the info you need to install and use it is here: https://uqrmaie1.github.io/admixtools/index.html

Btw bro, I have some questions if you would be so kind as to answer them xdd

1.When I select an individual, it selected automatically all the rest that share the same name, how do I do if I want to select and take just 1 individual (Or certain individuals) and not the rest?

2.How do I change their name?

For example: Mexico_Colonial_European to Spanish_Conquistador

3.How do I make new labels and bring together individuals that have different names/label but genetically are in the same cluster?

For example:

*Iberian IA (The new label I want to make)
Inside of that name would be this samples:
-Spain_Roman_oLocal
-Spain_IA
-Spain_Hellenistic_oLocal
-Spain_LIA
-Spain_Tartessian_EIA

How many can I put in that label? (The maximum).

4.Can I put samples in the same run that are .SG, .DG and .WGA on the left or right or both? Or do they all have to be one type?

Celtion · Oct 28, 2024

Jalisciense said:
Btw bro, I have some questions if you would be so kind as to answer them xdd

1.When I select an individual, it selected automatically all the rest that share the same name, how do I do if I want to select and take just 1 individual (Or certain individuals) and not the rest?

2.How do I change their name?

For example: Mexico_Colonial_European to Spanish_Conquistador

3.How do I make new labels and bring together individuals that have different names/label but genetically are in the same cluster?

For example:

*Iberian IA (The new label I want to make)
Inside of that name would be this samples:
-Spain_Roman_oLocal
-Spain_IA
-Spain_Hellenistic_oLocal
-Spain_LIA
-Spain_Tartessian_EIA

How many can I put in that label? (The maximum).

4.Can I put samples in the same run that are .SG, .DG and .WGA on the left or right or both? Or do they all have to be one type?

To be honest I haven't used AdmixTools2 (or AT1) yet – I just know some things about how programs like these run. (I've been mostly working with ADMIXTURE up to now).
However to change the population and/or sample names you edit the .ind file if you are using Eigenstrat or PackedAncestryMap format – or .fam file if you are using PLINK format files. You should be able to open them in a text editor. Just make sure not to change the order of the samples in the file. And when you change the names of the sample ID and/or family/group, make sure you keep the same column positions etc.
.SG, .DG and .WGA just refer to the type of DNA sequencing the sample underwent so you can generally use them anywhere. However some of the others here have noticed the program has a bias toward .SG samples and might assign those ones a heavier weighting.

baeticvs · Nov 13, 2024

Hi.
So I installed and got qpAdm running without having too many problems but merging my files with the AADR dataset is proving to be impossible. First I tried merging the files directly in the EIGENSTRAT format(with -mergeit) but didnt get very far, then I read here about converting both the AADR dataset and your personal files in PLINK format to merge them and I followed the @Jovialis tutorial (post #201), which did work but I encountered another problem when converting the merged files back to EIGENSTRAT (as far as I know qpAdm only works with this format except if your are using AT2), it leaves out me out!
This is the output:

Any idea why this happens? maybe my original RAW DNA file has poor coverage(MyHeritage, 600k SNPs)

^here is the problem I believe

Jovialis · Nov 13, 2024

Personally, I don't know how to convert back from plink to Eigenstrat; every time I try I get a massively bloated file. I've asked around and haven't found anyone that knows either. I would love to figure it out, so I could start making PCAs in smartpca in the Eigensoft suite, which is what academic studies use.

How big is the resulting file that is produced? Mine is usually 4x bigger than the original geno file, but the other files are the same size.

However, I haven't encountered the issue of it not recognizing mine.

If you figure this out, or if someone knows, please share.

Jovialis · Nov 13, 2024

baeticvs said:
Hi.
So I installed and got qpAdm running without having too many problems but merging my files with the AADR dataset is proving to be impossible. First I tried merging the files directly in the EIGENSTRAT format(with -mergeit) but didnt get very far, then I read here about converting both the AADR dataset and your personal files in PLINK format to merge them and I followed the @Jovialis tutorial (post #201), which did work but I encountered another problem when converting the merged files back to EIGENSTRAT (as far as I know qpAdm only works with this format except if your are using AT2), it leaves out me out!
This is the output:
View attachment 16915
Any idea why this happens? maybe my original RAW DNA file has poor coverage(MyHeritage, 600k SNPs)
View attachment 16916
^here is the problem I believe

I ran it through Chatgpt, see if it provides you any help:

The warnings in your PLINK output suggest two specific issues:

### 1. **Het. Haploid Genotypes**
- **Warning:** "10996 het. haploid genotypes present."
- **Explanation:** This means that certain genotypes on sex chromosomes (X/Y) are flagged as heterozygous in individuals who should have only one allele (males on the X or Y chromosome, or anyone on the mitochondrial DNA). PLINK treats these as errors since haploid regions should only have one allele.
- **Solution:**
- Ensure that the sex of individuals is correctly specified in your `.fam` file.
- You can exclude problematic variants or correct these genotypes. Use the following command:
```bash
plink --bfile [your_file] --set-hh-missing --make-bed --out corrected_data
```
This will set heterozygous haploid genotypes to missing.

### 2. **Nonmissing Nonmale Y Chromosome Genotypes**
- **Warning:** "Nonmissing nonmale Y chromosome genotype(s) present."
- **Explanation:** Genotypes for the Y chromosome are present in individuals who are not male (as per their sex information in the `.fam` file). PLINK expects only males to have Y chromosome data.
- **Solution:**
- Check for errors in the `.fam` file’s sex designation. Ensure that:
- `1` = male
- `2` = female
- `0` = unknown
- If there are data inconsistencies, you can filter or correct them:
```bash
plink --bfile [your_file] --filter-males --make-bed --out males_only
```

### 3. **Phenotype Warnings**
- "16398 are cases and 0 are controls. (1 phenotype is missing.)"
- **Explanation:** This means the `.fam` file contains only case phenotypes (coded as `2`) and no controls (`1`), which is likely incorrect unless you are analyzing a dataset with no controls.
- **Solution:** If this is unintentional, check the phenotype column in your `.fam` file. You might need to correct missing or mislabeled phenotypes.

Would you like assistance fixing the input files or running specific commands?

baeticvs · Nov 13, 2024

Jovialis said:
Personally, I don't know how to convert back from plink to Eigenstrat; every time I try I get a massively bloated file. I've asked around and haven't found anyone that knows either. I would love to figure it out, so I could start making PCAs in smartpca in the Eigensoft suite, which is what academic studies use.

How big is the resulting file that is produced? Mine is usually 4x bigger than the original geno file, but the other files are the same size.

However, I haven't encountered the issue of it not recognizing mine.

If you figure this out, or if someone knows, please share.

I think the only way to convert from PLINK to EIGENSTRAT is like its stated here https://github.com/roberta-davidson/ADMIXTURE-smartPCA-PLINK-and-EIGENSOFT?tab=readme-ov-file#readme
I do also get a massive geno file , from 4,7gb to 18,8gb

Jovialis · Nov 13, 2024

Apparently it is something which unavoidable when converting from Plink to Eigenstrat.

Nevertheless, despite being larger in size, it shouldn't affect the results.

baeticvs · Nov 13, 2024

Jovialis said:

Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ

Code:

# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4a: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 4b: Filter the AADR dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/v62_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

what does your .fam file says after conversion? (1st step)
I get this aka no phenotype
FAM001 ID001 0 0 1 -9

baeticvs · Nov 14, 2024

ok so apparently converting from plink to eigensoft its not possible, you get an unusable "plain text document" .geno file no matter what
this is what a proper looking .geno file looks like

(the system recognizes it as a different file extension)

however this is what u get when converting from plink to eigensoft with convertf

maybe the problem its on the .bed file I generate when merging tho

Edit: I figured it out

Jovialis · Nov 14, 2024

baeticvs said:
what does your .fam file says after conversion? (1st step)
I get this aka no phenotype
FAM001 ID001 0 0 1 -9

I think you can manually change it to 1, have you tried used Visual Code Studio? It is a free program for modifying code. It would be more suitable for modifying those files than notepad.

baeticvs · Nov 14, 2024

Jovialis said:
I think you can manually change it to 1, have you tried used Visual Code Studio? It is a free program for modifying code. It would be more suitable for modifying those files than notepad.

thats what I did, change it manually to 2

baeticvs · Nov 14, 2024

I ended up installing AT2 to use the PLINK files instead, does all this look alright?

I did pick random samples, this was just to see if it worked. As you can see the samples labels are numbers (first column of the .fam file)

The .fam file and the first column

can I edit the numbers and change them for what would be their corresponding names?
EDIT: just tried, yes you can edit the .fam file (I believe its fine as long as you dont mix up or change the order of the samples)

baeticvs · Nov 14, 2024

lockdownboredom said:
Can you use MyHeritage raw data with this program?

Yes you can

baeticvs · Nov 16, 2024

I have so many questions... but I already spammed this thread enough

Jovialis · Nov 16, 2024

baeticvs said:
I have so many questions... but I already spammed this thread enough

Feel free to ask as many as you want. This thread is for learning. I wish I could answer all of them, but I'm still learning as well.

Right now I'm trying to properly make a pca in smartpca, but having a hell of a time.

baeticvs · Nov 16, 2024

Jovialis said:
Feel free to ask as many as you want. This thread is for learning. I wish I could answer all of them, but I'm still learning as well.

hmm, do you have the same number of SNPs in your merged files of different formats? my merged v54.1.p1_HO.bim file got 597k SNPs, pretty good considering my raw DNA data from MyHeritage had a bit over 600k SNPs, but the .snp file that I got as a result of merging with v54.1.p1_1240K and converting back to eigenstrat only has 175k

Right now I'm trying to properly make a pca in smartpca, but having a hell of a time.

but I saw a thread of yours on how to make a pca with AI!

Admixtools admixtools2 TUTORIAL for WINDOWS.

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Advisor

Advisor

Regular Member

Advisor

Regular Member

Regular Member

Advisor

Regular Member

Regular Member

Regular Member

Regular Member

Advisor

Regular Member