Help poup.

[close ]

Motifs discovered by STREME in MEME motif format.

[ close ]

STREME results in XML format.

[ close ]

STREME outputs a tab-separated values (TSV) file ('sequences.tsv') containing one line for each sequence with a site whose score passes the motif's match threshold for each motif discovered by STREME. The lines are grouped by motif, and groups are separated by a line starting with the character "#". The first line in the file contains the (tab-separated) names of the fields. The names and meanings of each of the fields are described in the table below.

field name contents
1 motif_ID The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence.
2 motif_ALT_ID The alternate name of the motif is STREME-<index>, where <index> is the rank of the motif according to P-value or Score.
3 motif_P-value The p-value of the motif based on applying the appropriate statistical test to the test set sequences. It is not adjusted for the number of motifs reported by STREME.

If STREME reports a single motif, then the p-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. However, if STREME reports more than one motif, the p-value does NOT completely account for multiple testing, and you should use the E-value for assessing whether a motif is truly statistically significant.

motif_Score The Score is the unadjusted p-value of the motif based on the appropriate test applied to the training set sequences. Since the Score is not adjusted for multiple tests, it cannot be used to determine the statistical significance of the motif.
4 seq_ID The ID of the sequence.
5 seq_Score The seq_Score of a sequence is its maximum motif match score over all sequence positions. The motif match score of a position in a sequence is computed by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif.
5 seq_Class Whether the sequence is a true positive, 'tp', or a false positive, 'fp'.
6 is_holdout? Whether the sequence was in the holdout set, '1', or not, '0'.
[ close ]

The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence.

Read more about the MEME Suite's use of the IUPAC alphabets.

[close ]

Click on the blue symbol below to reveal detailed information about the motif.

[close ]

Click on the blue symbol below to reveal options allowing you to submit this motif to another MEME Suite motif analysis program, to download this motif in various text formats, or to download a sequence "logo" of this motif PNG or EPS format.

Supported Programs
Tomtom
Tomtom is a tool for searching for similar known motifs. [manual]
MAST
MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs. [manual]
FIMO
FIMO is a tool for searching biological sequence databases for sequences that contain one or more known motifs. [manual]
GOMo
GOMo is a tool for identifying possible roles (Gene Ontology terms) for DNA binding motifs. [manual]
SpaMo
SpaMo is a tool for inferring possible transcription factor complexes by finding motifs with enriched spacings. [manual]

[close ]

This plot shows the positional distribution of the best match to the motif in the positive training sequences. Only matches with scores at least the score threshold are considered. The plot is smoothed with a triangular function whose width is 5% of the maximum positive training sequence length. The position of the dotted vertical line indicates whether the sequences were aligned on their left ends, centers, or right ends, respectively.

[ close ]

This histogram shows the distribution of the number of matches to the motif in the positive training sequences with at least one match. Only matches with scores at least the score threshold are considered.

[ close ]

The number of positive sequences matching the motif (percentage).

[close ]

The number of training set positive sequences matching the motif / the number of training set positive sequences.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The number of training set positive sequences matching the motif.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The number of training set negative sequences matching the motif / the number of training set negative sequences.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The number test set positive sequences matching the motif / the number of test set positive sequences.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The number of test set positive sequences matching the motif.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The number of test set negative sequences matching the motif / the number of test set negative sequences.

Note these counts are made after erasing sites that match previously found motifs.

[close ]

The mean distance from the center of the best match to the sequence center, averaged over all training set sequences with a match.

[close ]

The mean distance from the center of the best match to the sequence center, averaged over all test set sequences with a match.

[close ]

The Score is the unadjusted p-value of the motif based on the appropriate test applied to the training set sequences. Since the Score is not adjusted for multiple tests, it cannot be used to determine the statistical significance of the motif.

For determining if a motif is statistically significant, you should use the value in the E-value column. If there is no E-value column, that means that either the positive or negative hold-out set would have been too small (fewer than 5 sequences). For very small sequence sets, it is not practical for STREME to compute an accurate E-value. In that case, you can determine if your motif is significant by running STREME twenty or more times on shuffled versions of your positive dataset, and seeing if the Score is always larger than the Score using the original sequences. You can make shuffled sequence datasets using the MEME Suite command-line utility fasta-shuffle-letters) if you have installed the MEME Suite on your own computer.

The statistical test used in computing the Score is either the Fisher Exact Test, the Binomial Test, or the Cumulative Bates distribution. (See Inputs and Settings for the particular test being used.) The Fisher Exact Test and the Binomial Test both estimate the enrichment of the motif in the positive sequences compared to the the negative sequences. (The Binomial Test is used when the positive and negative sequences have different average lengths.) The Cumulative Bates distribution measures the tendency of motif to be near the center of the input sequences.

[close ]

The p-value of the motif based on applying the appropriate statistical test to the test set sequences. It is not adjusted for the number of motifs reported by STREME.

If STREME reports a single motif, then the p-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. However, if STREME reports more than one motif, the p-value does NOT completely account for multiple testing, and you should use the E-value for assessing whether a motif is truly statistically significant.

The statistical test used in computing the p-value is either the Fisher Exact Test, the Binomial Test, or the Cumulative Bates distribution. (See Inputs and Settings at the bottom of this document for the particular test being used.) The Fisher Exact Test and the Binomial Test both measure the enrichment of the motif in the positive test sequences compared to the the negative test sequences. (The Binomial Test is used when the positive and negative sequences have different average lengths.) The Cumulative Bates distribution measures the tendency of motif to be near the center of the sequences.

[close ]

The E-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. The E-value is the p-value multiplied by the number of motifs reported by STREME. It is an estimate of the number of motifs that would be found with enrichment as high as this motif in shuffled versions of your positive sequences.

[close ]

The score threshold for determining if a potential site is a match to the motif. The same threshold is applied when determining matches in the training and test sequences. The threshold is in bits.

The match score of a position in a sequence is determined by converting the motif to a base-2 log-odds matrix using the formula log2(prob[a][i]/background[a]). Here, prob[a][i] is the probability of the letter 'a' at position 'i' of the motif, and background[a] is the probability of the letter 'a' according to the background.

[close ]

The names of the files containing the positive (primary) and negative (control) sequences input to STREME.

If you did not provide a file containing the negative (e.g., control) sequences, STREME created them using N-order shuffling. 0-order shuffling preserves 1-mer frequencies (i.e., the letter frequencies), 1-order shuffling preserves 2-mer frequencies, etc.

[close ]

The name of the alphabet of the sequences.

[close ]

The number of sequences.

[close ]

The total length of the sequences.

[close ]

The name of the alphabet symbol.

[close ]

The frequency of the alphabet symbol in the negative sequences.

[close ]

The frequency of the alphabet symbol as defined by the background model.

[ close ]

Details

Train Positives 
Train Positives 
Train Negatives 
Train DTC 
Score 
Test Positives 
Test Positives 
Test Negatives 
Test DTC 
P-value 
Match Threshold 
/ () / () / () / ()

For further information on how to interpret these results please access https://meme-suite.org/meme/doc/streme.html.
To get a copy of the MEME software please access https://meme-suite.org.

If you use STREME in your research, please cite the following paper:
Timothy L. Bailey, "STREME: accurate and versatile sequence motif discovery", Bioinformatics, Mar. 24, 2021. [full text]

Discovered Motifs   |   Inputs & Settings   |   Program Information   |   Motifs in MEME Text Format 
  |   Matching Sequences 
>   |   Results in XML Format 

Your browser does not support canvas!

Discovered Motifs

Next Top
Motif 
P-value 
E-value 
Sites 
More 
Submit/Download 
Positional Distribution 
Matches per Sequence 
Stopped because 3 consecutive motifs exceeded the p-value threshold (0.05).
STREME ran for 32.68 seconds.

Inputs & Settings

Previous Next Top

Sequences

Role Source 
Alphabet 
Sequence Count 
Total Size 
Positive (primary) Sequences GSM4160247-ETO--BTZ_WO_meme-chip/seqs-centered DNA 2000 200000
Negative (control) Sequences 2-Order Shuffled Positive Sequences DNA 2000 200000

Background Model

    Source: built from the negative (control) sequences

    Order: 2 (only order-0 shown)
Name 
Freq. 
Bg. 
Bg. 
Freq. 
Name 
Adenine0.2520.252A~T0.2520.252Thymine
Cytosine0.2480.248C~G0.2480.248Guanine

Other Settings

Strand Handling This alphabet only has one strand. Only the given strand is processed. Both the given and reverse complement strands are processed.
Objective FunctionDifferential Enrichment
Statistical TestFisher Exact Test
Minimum Motif Width6
Maximum Motif Width15
Sequence ShufflingNegative sequences are positives shuffled preserving 3-mer frequencies.
Test Set10% of the input sequences were randomly assigned to the test set.
Word EvaluationUp to 25 words of each width from 6 to 15 were evaluated to find seeds.
Seed RefinementUp to 4 seeds of each width from 6 to 15 were further refined.
Refinement IterationsUp to 20 iterations were allowed when refining a seed.
Random Number Seed0
Total LengthThe total length of each sequence set was limited to 4.00e+6.
Maximum Motif p-valueStop when the p-value is greater than 0.05 for 3 consecutive motifs.
Maximum Motifs to FindNo maximum number of motifs.
Maximum Run TimeNo maximum running time.
Previous Top
STREME version
5.5.2 (Release date: Sun Jan 29 10:33:12 2023 -0800)
Reference
Timothy L. Bailey, "STREME: accurate and versatile sequence motif discovery", Bioinformatics, Mar. 24, 2021. [full text]
Command line