Help poup.

Motifs discovered by STREME in MEME motif format.

[ close ]

STREME results in XML format.

[ close ]

STREME outputs a tab-separated values (TSV) file ('sequences.tsv') containing one line for each sequence with a site whose score passes the motif's match threshold for each motif discovered by STREME. The lines are grouped by motif, and groups are separated by a line starting with the character "#". The first line in the file contains the (tab-separated) names of the fields. The names and meanings of each of the fields are described in the table below.

field	name	contents
1	motif_ID	The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence.
2	motif_ALT_ID	The alternate name of the motif is STREME-<index>, where <index> is the rank of the motif according to P-value or Score.
3	motif_P-value	The p-value of the motif based on applying the appropriate statistical test to the test set sequences. It is not adjusted for the number of motifs reported by STREME. If STREME reports a single motif, then the p-value is an accurate estimate of the statistical significance of the motif as long as the length distributions of the positive and negative sequences are essentially the same. However, if STREME reports more than one motif, the p-value does NOT completely account for multiple testing, and you should use the E-value for assessing whether a motif is truly statistically significant.
3	motif_Score	The Score is the unadjusted p-value of the motif based on the appropriate test applied to the training set sequences. Since the Score is not adjusted for multiple tests, it cannot be used to determine the statistical significance of the motif.
4	seq_ID	The ID of the sequence.
5	seq_Score	The seq_Score of a sequence is its maximum motif match score over all sequence positions. The motif match score of a position in a sequence is computed by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif.
5	seq_Class	Whether the sequence is a true positive, 'tp', or a false positive, 'fp'.
6	is_holdout?	Whether the sequence was in the holdout set, '1', or not, '0'.

[ close ]

The name of the motif uses the IUPAC codes for nucleotides or proteins. Letters representing multiple nucleotides are used in nucleotide motif positions where several nucleotides are favored. The name of the motif is <index>-<consensus>, where <index> is the rank of the motif according to P-value or Score, and <consensus> is an approximation of the motif by an IUPAC sequence.

Role	Source	Alphabet	Sequence Count	Total Size
Positive (primary) Sequences	GSM4160247-ETO--BTZ_WO_meme-chip/seqs-centered	DNA	2000	200000
Negative (control) Sequences	2-Order Shuffled Positive Sequences	DNA	2000	200000

Name	Freq.	Bg.				Bg.	Freq.	Name
Adenine	0.252	0.252	A	~	T	0.252	0.252	Thymine
Cytosine	0.248	0.248	C	~	G	0.248	0.248	Guanine

Strand Handling	This alphabet only has one strand. Only the given strand is processed. Both the given and reverse complement strands are processed.
Objective Function	Differential Enrichment
Statistical Test	Fisher Exact Test
Motif Selection Criterion	Output motif with the lowest p-value on the training set each round.
Minimum Motif Width	6
Maximum Motif Width	15
Sequence Shuffling	Negative sequences are positives shuffled preserving 3-mer frequencies.
Test Set	10% of the input sequences were randomly assigned to the test set.
Word Evaluation	Up to 25 words of each width from 6 to 15 were evaluated to find seeds.
Seed Refinement	Up to 4 seeds of each width from 6 to 15 were further refined.
Refinement Iterations	Up to 20 iterations were allowed when refining a seed.
Minimum Score	Match scoring was truncated if a match longer than 5 scored less than 0.
Refinement Match Subsets	A new motif was created from the optimal set of matches each refinement iteration.
Minimum Palindrome Ratio	0.85
Maximum Palindrome Edit Distance	5
Print Candidate Motifs?	No.
Random Number Seed	0
Total Length	The total length of each sequence set was limited to 4.00e+6.
Maximum Motif p-value	Stop when the p-value is greater than 0.05 for 3 consecutive motifs.
Maximum Motifs to Find	No maximum number of motifs.
Maximum Run Time	No maximum running time.

	Tomtom	Find similar motifs in published libraries or a library you supply.
	FIMO	Find motif occurrences in sequence data.
	MAST	Rank sequences by affinity to groups of motifs.
	GOMo	Identify possible roles (Gene Ontology terms) for motifs.
	SpaMo	Find other motifs that are enriched at specific close spacings which might imply the existence of a complex.

Format:
Orientation:
Small Sample Correction:
Width:	cm
Height:	cm

Supported Programs

Details

Submit or Download

Submit to program

STREME

Sensitive, Thorough, Rapid, Enriched Motif Elicitation

Your browser does not support canvas!

Discovered Motifs

Details

Details

Details

Details

Details

Details

Details

Inputs & Settings

Sequences

Background Model

Other Settings

STREME version

Reference

Command line