查詢GEO時如何限制搜索空間?


0

我正在使用以下命令來檢索daf-2 c的GEO數據集條目。具有高通量測序功能的線蟲:

esearch -db gds -query "daf-2" | efilter -query "expression profiling by high throughput sequencing [DataSet Type] AND Caenorhabditis elegans [organism]"

但是,在我得到的結果中,我注意到我得到了:

23. Impaired Insulin-/IGF1-Signaling Extends Life Span by Promoting Mitochondrial L-Proline Catabolism to Induce a Transient ROS-Signal
(Submitter supplied) Transcriptome profiling of three models with impaired insulin/IGF1 signaling. 1. Deep sequencing of endogenous mRNA from Caenorhabditis elegans N2 var. Bristol (wildtype) and daf-2(e1370) mutant; 2. Deep sequencing  of endogenous mRNA from murine embryonic fibroblasts (MEF)  wildtype and irs1-/- knockout; 3. Deep sequencing of endogenous mRNA from murine embryoinic fibroblast (MEF) insr+/- -lox and insr+/- knockout  Jena Centre for Systems Biology of Ageing - JenAge (www.jenage.de)
Organism:   Mus musculus; Caenorhabditis elegans
Type:       Expression profiling by high throughput sequencing
Platforms: GPL11002 GPL13776 14 Samples
FTP download: GEO (CSV) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE36nnn/GSE36041/
SRA Run Selector: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA151765
Series      Accession: GSE36041 ID: 200036041

如您所見,在Organism字段中同時存在Mus musculusCaenorhabditis elegans。如何限制搜索,以便僅獲得僅在Caenorhabditis elegans上有效的條目?一種簡單的方法是在查詢中包含NOT Mus Musculus,但這當然意味著我也可以得到其他東西。

我想解決此問題的另一種方法是編寫一個腳本,使用正則表達式進行額外的過濾,但是我想知道是否存在使用電子實用程序功能的更簡單解決方案

0

You can use the following query to first search GEO to get a list of GEO Series of interest, then find all linked SRA runs and perform a second filter step to keep only the SRA runs that satisfy another set of criteria as shown below.

esearch \
  -db gds \
  -query "daf-2[all fields] \
    AND expression profiling by high throughput sequencing [DataSet Type] \
    AND Caenorhabditis elegans [organism]" \
  | elink \
  -db gds -target sra \
  | esearch \
  -query "(#3) \
    AND Caenorhabditis elegans [organism]" \
  | efetch -format runinfo

Here, I am combining two independent queries using the (#3) term in the second query. More information about this is in the "Combining Independent Queries" subsection of "Searching and Filtering" section of the Entrez Direct documentation.