Help poup.
Motifs discovered by STREME in MEME motif format.
STREME results in XML format.
STREME outputs a tab-separated values (TSV) file ('sequences.tsv') containing one line for each sequence with a site whose score passes the motif's match threshold for each motif discovered by STREME. The lines are grouped by motif, and groups are separated by a line starting with the character "#". The first line in the file contains the (tab-separated) names of the fields. The names and meanings of each of the fields are described in the table below.
field | name | contents |
---|---|---|
1 | motif_ID | The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence. |
2 | motif_ALT_ID | The alternate name of the motif is STREME-<index>, where <index> is the rank of the motif according to P-value or Score. |
3 | motif_P-value |
The p-value of the motif based on applying the appropriate statistical test
to the test set sequences. It is not adjusted for the number of motifs reported by STREME.
If STREME reports a single motif, then the p-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. However, if STREME reports more than one motif, the p-value does NOT completely account for multiple testing, and you should use the E-value for assessing whether a motif is truly statistically significant. |
motif_Score | The Score is the unadjusted p-value of the motif based on the appropriate test applied to the training set sequences. Since the Score is not adjusted for multiple tests, it cannot be used to determine the statistical significance of the motif. | |
4 | seq_ID | The ID of the sequence. |
5 | seq_Score | The seq_Score of a sequence is its maximum motif match score over all sequence positions. The motif match score of a position in a sequence is computed by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif. |
5 | seq_Class | Whether the sequence is a true positive, 'tp', or a false positive, 'fp'. |
6 | is_holdout? | Whether the sequence was in the holdout set, '1', or not, '0'. |
The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence.
Read more about the MEME Suite's use of the IUPAC alphabets.
The sequence logo of the motif. The rules for construction logos are given in the Description section of the documentation for the MEME Suite utility ceqlogo.
The sequence logo of the reverse complement motif. The rules for construction logos are given in the Description section of the documentation for the MEME Suite utility ceqlogo.
Click on the blue symbol below to reveal detailed information about the motif.
Click on the blue symbol below to reveal options allowing you to submit this motif to another MEME Suite motif analysis program, to download this motif in various text formats, or to download a sequence "logo" of this motif PNG or EPS format.
This plot shows the positional distribution of the best match to the motif in the positive training sequences. Only matches with scores at least the score threshold are considered. The plot is smoothed with a triangular function whose width is 5% of the maximum positive training sequence length. The position of the dotted vertical line indicates whether the sequences were aligned on their left ends, centers, or right ends, respectively.
This histogram shows the distribution of the number of matches to the motif in the positive training sequences with at least one match. Only matches with scores at least the score threshold are considered.
The number of positive sequences matching the motif (percentage).
The number of training set positive sequences matching the motif / the number of training set positive sequences.
Note these counts are made after erasing sites that match previously found motifs.
The number of training set positive sequences matching the motif.
Note these counts are made after erasing sites that match previously found motifs.
The number of training set negative sequences matching the motif / the number of training set negative sequences.
Note these counts are made after erasing sites that match previously found motifs.
The number test set positive sequences matching the motif / the number of test set positive sequences.
Note these counts are made after erasing sites that match previously found motifs.
The number of test set positive sequences matching the motif.
Note these counts are made after erasing sites that match previously found motifs.
The number of test set negative sequences matching the motif / the number of test set negative sequences.
Note these counts are made after erasing sites that match previously found motifs.
The mean distance from the center of the best match to the sequence center, averaged over all training set sequences with a match.
The mean distance from the center of the best match to the sequence center, averaged over all test set sequences with a match.
The Score is the unadjusted p-value of the motif based on the appropriate test applied to the training set sequences. Since the Score is not adjusted for multiple tests, it cannot be used to determine the statistical significance of the motif.
For determining if a motif is statistically significant, you should use the value in the E-value column. If there is no E-value column, that means that either the positive or negative hold-out set would have been too small (fewer than 5 sequences). For very small sequence sets, it is not practical for STREME to compute an accurate E-value. In that case, you can determine if your motif is significant by running STREME twenty or more times on shuffled versions of your positive dataset, and seeing if the Score is always larger than the Score using the original sequences. You can make shuffled sequence datasets using the MEME Suite command-line utility fasta-shuffle-letters) if you have installed the MEME Suite on your own computer.
The statistical test used in computing the Score is either the Fisher Exact Test, the Binomial Test, or the Cumulative Bates distribution. (See Inputs and Settings for the particular test being used.) The Fisher Exact Test and the Binomial Test both estimate the enrichment of the motif in the positive sequences compared to the the negative sequences. (The Binomial Test is used when the positive and negative sequences have different average lengths.) The Cumulative Bates distribution measures the tendency of motif to be near the center of the input sequences.
The p-value of the motif based on applying the appropriate statistical test to the test set sequences. It is not adjusted for the number of motifs reported by STREME.
If STREME reports a single motif, then the p-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. However, if STREME reports more than one motif, the p-value does NOT completely account for multiple testing, and you should use the E-value for assessing whether a motif is truly statistically significant.
The statistical test used in computing the p-value is either the Fisher Exact Test, the Binomial Test, or the Cumulative Bates distribution. (See Inputs and Settings at the bottom of this document for the particular test being used.) The Fisher Exact Test and the Binomial Test both measure the enrichment of the motif in the positive test sequences compared to the the negative test sequences. (The Binomial Test is used when the positive and negative sequences have different average lengths.) The Cumulative Bates distribution measures the tendency of motif to be near the center of the sequences.
The E-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. The E-value is the p-value multiplied by the number of motifs reported by STREME. It is an estimate of the number of motifs that would be found with enrichment as high as this motif in shuffled versions of your positive sequences.
The score threshold for determining if a potential site is a match to the motif. The same threshold is applied when determining matches in the training and test sequences. The threshold is in bits.
The match score of a position in a sequence is determined by converting the motif to a base-2 log-odds matrix using the formula log2(prob[a][i]/background[a]). Here, prob[a][i] is the probability of the letter 'a' at position 'i' of the motif, and background[a] is the probability of the letter 'a' according to the background.
The names of the files containing the positive (primary) and negative (control) sequences input to STREME.
If you did not provide a file containing the negative (e.g., control) sequences, STREME created them using N-order shuffling. 0-order shuffling preserves 1-mer frequencies (i.e., the letter frequencies), 1-order shuffling preserves 2-mer frequencies, etc.
The name of the alphabet of the sequences.
The number of sequences.
The total length of the sequences.
The name of the alphabet symbol.
The frequency of the alphabet symbol in the negative sequences.
The frequency of the alphabet symbol as defined by the background model.
For further information on how to interpret these results please access
https://meme-suite.org/meme/doc/streme.html.
To get a copy of the MEME software please access
https://meme-suite.org.
If you use STREME in your research, please cite the following paper:
Timothy L. Bailey,
"STREME: accurate and versatile sequence motif discovery",
Bioinformatics, Mar. 24, 2021.
[full text]
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
---|---|---|---|---|---|---|---|---|---|
1-AYCCACCGTHYGYB | 6.4e-012 | 4.5e-011 | 749 (37.5%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
2-ACGCACGCAHGYG | 1.5e-002 | 1.1e-001 | 68 (3.4%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
3-CCAGMAGRGGGCR | 3.0e-002 | 2.1e-001 | 48 (2.4%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
4-AGGAAG | 3.3e-002 | 2.3e-001 | 264 (13.2%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
5-AWTWAWT | 1.7e-001 | 1.2e+000 | 471 (23.6%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
6-TAATCCCAGCAC | 2.5e-001 | 1.7e+000 | 43 (2.1%) | ↧↥ | ⇢ | ||||
Motif | Logo | RC Logo | P-value | E-value | Sites | More | Submit/Download | Positional Distribution | Matches per Sequence |
7-CCACAGGGTGGC | 5.0e-001 | 3.5e+000 | 45 (2.3%) | ↧↥ | ⇢ | ||||
Stopped because 3 consecutive motifs exceeded the p-value threshold (0.05). | |||||||||
STREME ran for 32.68 seconds. |
Role | Source | Alphabet | Sequence Count | Total Size |
---|---|---|---|---|
Positive (primary) Sequences | GSM4160247-ETO--BTZ_WO_meme-chip/seqs-centered | DNA | 2000 | 200000 |
Negative (control) Sequences | 2-Order Shuffled Positive Sequences | DNA | 2000 | 200000 |
Source: built from the negative (control) sequences
Order: 2 (only order-0 shown)Name | Freq. | Bg. | Bg. | Freq. | Name | |||
---|---|---|---|---|---|---|---|---|
Adenine | 0.252 | 0.252 | A | ~ | T | 0.252 | 0.252 | Thymine |
Cytosine | 0.248 | 0.248 | C | ~ | G | 0.248 | 0.248 | Guanine |