到目前為止我的工作流程:
在未組裝的基因組中查找標記基因的片段>下載並組裝後的基因組>恢復感興趣的基因鄰域/基因組
現在,通過組裝的重疊群上的"深度"(我使用MEGAHIT進行組裝),我對這些基因的豐富程度有了一個粗略的估計。我想知道是否有更徹底/正確的方法來做到這一點。我想比較a)同一研究中的樣本與b)不同研究中特定基因的豐度。我想在兩種情況下都應考慮單個元基因組的大小,但是b)點可能會增加其他困難,例如不同的測序技術。非常感謝您的見解。
到目前為止我的工作流程:
在未組裝的基因組中查找標記基因的片段>下載並組裝後的基因組>恢復感興趣的基因鄰域/基因組
現在,通過組裝的重疊群上的"深度"(我使用MEGAHIT進行組裝),我對這些基因的豐富程度有了一個粗略的估計。我想知道是否有更徹底/正確的方法來做到這一點。我想比較a)同一研究中的樣本與b)不同研究中特定基因的豐度。我想在兩種情況下都應考慮單個元基因組的大小,但是b)點可能會增加其他困難,例如不同的測序技術。非常感謝您的見解。
I would avoid using assemblies to answer this question, as there's no guarantee that you will be able to assemble your genes of interest; you can however estimate their abundance even if they are relatively rare.
How I understand your question as being one of estimating the abundance of either some specific genes (e.g. butyrate metabolism genes) or all genes in a microbial community across multiple samples for comparative purposes. In other words, not 16S or marker gene analysis for the purposes of estimating organismal abundance, which is a rather different problem (though in that case I would still not use an assembly).
A more standard workflow is:
Some examples of how this has been done are here, here, here. I am sure that there are more recent/relevant references but I haven't been following the field closely in the last few years.