Java Code Examples for htsjdk.tribble.annotation.Strand#POSITIVE

The following examples show how to use htsjdk.tribble.annotation.Strand#POSITIVE . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: FuncotatorUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 6 votes vote down vote up
@DataProvider
Object[][] provideForTestGetStrandCorrectedAllele() {
    return new Object[][] {
            { Allele.create("A"),       Strand.POSITIVE, Allele.create("A") },
            { Allele.create("AA"),      Strand.POSITIVE, Allele.create("AA") },
            { Allele.create("AAT"),     Strand.POSITIVE, Allele.create("AAT") },
            { Allele.create("AATT"),    Strand.POSITIVE, Allele.create("AATT") },
            { Allele.create("AATTG"),   Strand.POSITIVE, Allele.create("AATTG") },
            { Allele.create("AATTGC"),  Strand.POSITIVE, Allele.create("AATTGC") },
            { Allele.create("AATTGCG"), Strand.POSITIVE, Allele.create("AATTGCG") },
            { Allele.create("A"),       Strand.NEGATIVE, Allele.create("T") },
            { Allele.create("AA"),      Strand.NEGATIVE, Allele.create("TT") },
            { Allele.create("AAT"),     Strand.NEGATIVE, Allele.create("ATT") },
            { Allele.create("AATT"),    Strand.NEGATIVE, Allele.create("AATT") },
            { Allele.create("AATTG"),   Strand.NEGATIVE, Allele.create("CAATT") },
            { Allele.create("AATTGC"),  Strand.NEGATIVE, Allele.create("GCAATT") },
            { Allele.create("AATTGCG"), Strand.NEGATIVE, Allele.create("CGCAATT") },
    };
}
 
Example 2
Source File: ProteinChangeInfo.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
private void initializeForDeletion(final int alignedCodingSequenceAlleleStart, final Strand strand, final String referenceProteinSequence, final String alternateProteinSequence, final boolean indelIsBetweenCodons, final int numAltAminoAcids, final int numRefAminoAcids) {
    final int proteinChangeStartIndex;// We render the protein change differently if it's a deletion directly between two codons:
    if (indelIsBetweenCodons) {
        // Because we're inbetween codons / have full codons deleted, we can use the start position of the
        // variant:
        // Add 1 to account for the required leading base when on + strands:
        proteinChangeStartIndex = ((alignedCodingSequenceAlleleStart-1) / AminoAcid.CODON_LENGTH) + (strand == Strand.POSITIVE ? 1 : 0);

        aaStartPos = proteinChangeStartIndex + 1;
        aaEndPos = aaStartPos + numRefAminoAcids - 1;
        refAaSeq = referenceProteinSequence.substring(proteinChangeStartIndex, proteinChangeStartIndex + numRefAminoAcids);
        altAaSeq = "";
    }
    else {
        // To start with, we fill in the information naively corresponding to the potentially
        // changed amino acid sequence:
        proteinChangeStartIndex = ((alignedCodingSequenceAlleleStart - 1) / AminoAcid.CODON_LENGTH);

        // If we're on the - strand, we need to grab 1 fewer amino acid from the end of the sequence:
        final int endOffset = strand == Strand.POSITIVE ? 1 : 0;

        aaStartPos = proteinChangeStartIndex + 1;
        aaEndPos = aaStartPos + numRefAminoAcids + endOffset;

        refAaSeq = referenceProteinSequence.substring(proteinChangeStartIndex, aaEndPos);
        altAaSeq = alternateProteinSequence.substring(proteinChangeStartIndex, aaStartPos + numAltAminoAcids + endOffset);

        // Trim our state for this deletion:
        trimDeletionProteinChangeVariables();
    }

    // Check to make sure we have any alt amino acids left:
    if ( altAaSeq.isEmpty() ) {
        // We have actually just deleted a set of amino acids.
        // Just set the end to the start to complete the deletion:
        aaEndPos = aaStartPos;
    }
}
 
Example 3
Source File: SegmentExonUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
private static String determineSegmentOverlapDirection(final Strand strand, final boolean isSegmentStart) {
    if (isSegmentStart ^ (strand == Strand.POSITIVE)) {
        return AND_BELOW_STR;
    } else {
        return AND_ABOVE_STR;
    }
}
 
Example 4
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Get the strand-corrected (reverse complemented) {@link Allele} for the given {@link Allele} and {@link Strand}.
 * @param allele The {@link Allele} to correct for strandedness.
 * @param strand The {@link Strand} on which the given {@code allele} lies.  Must be valid as per {@link #assertValidStrand(Strand)}
 * @return The {@link Allele} with sequence corrected for strand.
 */
public static Allele getStrandCorrectedAllele(final Allele allele, final Strand strand) {
    assertValidStrand(strand);

    if ( strand == Strand.POSITIVE ) {
        return Allele.create(allele, false);
    }
    else {
        return Allele.create(ReadUtils.getBasesReverseComplement(allele.getBases()), false);
    }
}
 
Example 5
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
/**
 * Get the bases around the given ref allele in the correct direction of the strand for this variant.
 * The number of bases before and after the variant is specified by {@code referenceWindow}.
 * The result will be trimmed down by 1 base in the event that the variant is an indel in order to account for
 * the required preceding base in VCF format.
 *
 * ASSUMES: that the given {@link ReferenceContext} is already centered on the variant location.
 *
 * @param refAllele The reference {@link Allele} for the variant.  Used
 * @param altAllele The alternate {@link Allele} for the variant.
 * @param reference The {@link ReferenceContext} for the variant, with the current window centered on the variant with no padding around it.
 * @param strand The {@link Strand} on which the variant occurs.
 * @param referenceWindow The number of bases on either side of the variant to add to the resulting string.
 * @return A {@link StrandCorrectedReferenceBases} of bases of length either {@code referenceWindow} * 2 + |ref allele| OR {@code referenceWindow} * 2 + |ref allele| - 1 , corrected for strandedness.
 */
public static StrandCorrectedReferenceBases createReferenceSnippet(final Allele refAllele, final Allele altAllele, final ReferenceContext reference, final Strand strand, final int referenceWindow ) {

    // Make sure our window in the reference includes only our variant:
    Utils.validate(
            (reference.numWindowLeadingBases() == reference.numWindowTrailingBases()) && (reference.numWindowLeadingBases() == 0),
            "Reference must have no extra bases around the variant.  Found: " + reference.numWindowLeadingBases() + " before / + "  + reference.numWindowTrailingBases() + " + after"
    );

    // We must add bases to the start to adjust for indels because of the required preceding base in VCF format:
    final int indelStartBaseAdjustment = GATKVariantContextUtils.isIndel(refAllele, altAllele) ? 1 : 0;

    final int start = reference.getWindow().getStart() - referenceWindow + indelStartBaseAdjustment;
    final int end   = reference.getWindow().getEnd() + referenceWindow;

    // Calculate the interval from which to get the reference:
    final SimpleInterval refBasesInterval = new SimpleInterval(reference.getWindow().getContig(), start, end);

    // Get the reference bases for this interval.
    final byte[] referenceBases = reference.getBases(refBasesInterval);

    // Get the bases in the correct direction:
    if ( strand == Strand.POSITIVE ) {
        return new StrandCorrectedReferenceBases(referenceBases, strand);
    }
    else {
        return new StrandCorrectedReferenceBases(ReadUtils.getBasesReverseComplement(referenceBases), strand);
    }
}
 
Example 6
Source File: FuncotatorUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
@DataProvider
Object[][] provideDataForTestCreateSpliceSiteCodonChange() {

    return new Object[][] {
            {1000, 5, 1000, 1500, Strand.POSITIVE, 0, "c.e5-0"},
            {1000, 4, 1, 1500, Strand.POSITIVE,    0, "c.e4+500"},
            {1000, 3, 500, 1500, Strand.POSITIVE,  0, "c.e3-500"},

            {1000, 5, 1000, 1500, Strand.NEGATIVE, 0, "c.e5+0"},
            {1000, 4, 1, 1500, Strand.NEGATIVE,    0, "c.e4-500"},
            {1000, 3, 500, 1500, Strand.NEGATIVE,  0, "c.e3+500"},

            {1000, 5, 1500, 500, Strand.NEGATIVE,  0, "c.e5+500"},

            {1000, 5, 1000, 1500, Strand.POSITIVE, 1, "c.e5+1"},
            {1000, 4, 1, 1500, Strand.POSITIVE,    2, "c.e4+502"},
            {1000, 3, 500, 1500, Strand.POSITIVE,  3, "c.e3-497"},

            {1000, 5, 1000, 1500, Strand.NEGATIVE, 4, "c.e5+4"},
            {1000, 4, 1, 1500, Strand.NEGATIVE,    5, "c.e4-495"},
            {1000, 3, 500, 1500, Strand.NEGATIVE,  6, "c.e3+506"},

            {1000, 5, 1500, 500, Strand.NEGATIVE,  7, "c.e5+507"},

            {1000, 5, 1000, 1500, Strand.POSITIVE, -1, "c.e5-1"},
            {1000, 4, 1, 1500, Strand.POSITIVE,    -2, "c.e4+498"},
            {1000, 3, 500, 1500, Strand.POSITIVE,  -3, "c.e3-503"},

            {1000, 5, 1000, 1500, Strand.NEGATIVE, -4, "c.e5-4"},
            {1000, 4, 1, 1500, Strand.NEGATIVE,    -5, "c.e4-505"},
            {1000, 3, 500, 1500, Strand.NEGATIVE,  -6, "c.e3+494"},

            {1000, 5, 1500, 500, Strand.NEGATIVE,  -7, "c.e5+493"},
    };
}
 
Example 7
Source File: FuncotatorUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 5 votes vote down vote up
@DataProvider
Object[][] provideReferenceAndExonListForGatkExceptions() {

    return new Object[][] {
            {
                    new ReferenceContext(new ReferenceFileSource(TEST_REFERENCE), new SimpleInterval(TEST_REFERENCE_CONTIG, TEST_REFERENCE_START, TEST_REFERENCE_END)),
                    Collections.singletonList(
                            new SimpleInterval("2", TEST_REFERENCE_START + 500, TEST_REFERENCE_START + 550)
                    ),
                    Strand.POSITIVE
            },
            {
                    new ReferenceContext(new ReferenceFileSource(TEST_REFERENCE), new SimpleInterval(TEST_REFERENCE_CONTIG, TEST_REFERENCE_START, TEST_REFERENCE_END)),
                    Collections.singletonList(
                            new SimpleInterval("2", TEST_REFERENCE_START + 500, TEST_REFERENCE_START + 550)
                    ),
                    Strand.NEGATIVE
            },
            {
                    new ReferenceContext(new ReferenceFileSource(TEST_REFERENCE), new SimpleInterval(TEST_REFERENCE_CONTIG, TEST_REFERENCE_START, TEST_REFERENCE_END)),
                    Collections.singletonList(
                            new SimpleInterval("2", TEST_REFERENCE_START + 500, TEST_REFERENCE_START + 550)
                    ),
                    Strand.NONE
            },
    };
}
 
Example 8
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Checks to see whether a given indel location occurs on a codon boundary.
 * That is, whether the given indel location is not within a codon but is cleanly between two adjacent codons.
 * NOTE: ASSUMES that there is a leading base that is not part of the indel prepended to the indel string for context.
 * @param codingSequenceAlleleStart The start position of the variant in the coding sequence.
 * @param alignedCodingSequenceAlleleStart The start position of the first codon containing part of the variant in the coding sequence.
 * @param refAllele A {@link String} containing the bases in the reference allele for the given variant. Must not be {@code null}.
 * @param strand The {@link Strand} on which the current codon resides.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @return {@code true} if the given indel cleanly occurs between two adjacent codons; {@code false} otherwise.
 */
public static boolean isIndelBetweenCodons(final int codingSequenceAlleleStart,
                                            final int alignedCodingSequenceAlleleStart,
                                            final String refAllele,
                                            final Strand strand) {
    Utils.nonNull(refAllele);
    assertValidStrand(strand);

    if ( strand == Strand.POSITIVE ) {
        // Check normally for positive strands:
        final int codonOffset = codingSequenceAlleleStart - alignedCodingSequenceAlleleStart;
        return (((codonOffset + refAllele.length()) % AminoAcid.CODON_LENGTH) == 0);
    }
    else {
        // This is a little strange. Because we're on the reverse strand the bases will be inserted in
        // the opposite direction than what we expect (even though the coordinates are now in transcription order).
        // Because of this, we just need to check if we're at the start of a codon for if this occurs between them.
        //
        // For example:
        //
        //   1:1249193 C->CGCAA
        //
        //           insertion here
        //                 |
        //                 V
        //   +   ...aa|agt|ctt|gcg|ga...
        //   -   ...tt|tca|gaa|cgc|ct...
        //
        //                 |
        //                  \___
        //                  |   |
        //                  V   V
        //   +   ...aaa|gtc|GCA|Att|gcg|ga...
        //   -   ...ttt|cag|CGT|Taa|cgc|ct...
        //
        // If we inserted them on the + strand the bases would be inserted between codons, but because we're on
        // the - strand, the bases are inserted within the `aag` codon.

        return (codingSequenceAlleleStart == alignedCodingSequenceAlleleStart);
    }
}
 
Example 9
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
private static String getCodonChangeStringForFrameShifts(final SequenceComparison seqComp, final boolean isInsertion, int alignedCodonStart, int alignedCodonEnd, String refCodon) {

        // Requires:
        //     seqComp.getCodingSequenceAlleleStart()
        //     seqComp.getStrand()
        //     seqComp.getTranscriptCodingSequence()

        // Some special deletion adjustments to account for the VCF leading base problem.
        //
        if ( ((seqComp.getCodingSequenceAlleleStart() % AminoAcid.CODON_LENGTH) == 0) ) {
            // If we're an insertion and we start at the begining of a codon, we have to adjust for
            // the preceding base in the input VCF.  This means we have to skip the first or last AminoAcid.CODON_LENGTH bases in the aligned
            // codon, because we have included them to grab the aligned data for the preceding base.
            // Note: This is not the aligned coding sequence position, but the raw coding sequence position:
            if ( isInsertion ) {
                if ( seqComp.getStrand() == Strand.POSITIVE ) {
                    // Skip the first AminoAcid.CODON_LENGTH bases in the aligned codon:
                    alignedCodonStart += AminoAcid.CODON_LENGTH;
                    alignedCodonEnd += AminoAcid.CODON_LENGTH;
                    // Get the next bases in the coding sequence:
                    // TODO: Make sure this won't fail for insertions at the end of a transcript!
                    refCodon = seqComp.getTranscriptCodingSequence()
                            .getBaseString()
                            // Subtract 1 because we're 1-based for genomic coordinates:
                            .substring(alignedCodonStart-1, alignedCodonEnd)
                            .toLowerCase();
                }
            }
            // If we're a deletion and we start at the begining of a codon, we have to adjust for
            // the preceding base in the input VCF.  This means we have to skip the first or last AminoAcid.CODON_LENGTH bases in the aligned
            // codon, because we have included them to grab the aligned data for the preceding base.
            // Note: This is not the aligned coding sequence position, but the raw coding sequence position:
            else {
                if ( seqComp.getStrand() == Strand.POSITIVE ) {
                    // Skip the first AminoAcid.CODON_LENGTH bases in the aligned codon:
                    alignedCodonStart += AminoAcid.CODON_LENGTH;
                    refCodon = refCodon.substring(AminoAcid.CODON_LENGTH);
                }
                else {
                    // Skip the last AminoAcid.CODON_LENGTH bases in the aligned codon:
                    alignedCodonEnd -= AminoAcid.CODON_LENGTH;
                    refCodon = refCodon.substring(0, refCodon.length() - AminoAcid.CODON_LENGTH);
                }
            }
        }

        return String.format(CODON_CHANGE_FORMAT_STRING, alignedCodonStart, alignedCodonEnd, refCodon + "fs");
    }
 
Example 10
Source File: EnsemblGtfCodecUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
private GencodeGtfFeature createEcoliEnsemblGene(final int startingFeatureOrder, final String geneId, final String geneName, final int geneStart,
                                         final int geneEnd, final String transcriptId, final String transcriptName,
                                         final String exonId, final int cdsStart, final int cdsEnd) {

    // Placeholder for constant extra data:
    final String geneAnonymousOptionalFields = " gene_source \"ena\";";
    final String transcriptAnonymousOptionalFields = geneAnonymousOptionalFields + " transcript_source \"ena\";";

    int featureOrderNum = startingFeatureOrder;

    GencodeGtfFeatureBaseData data;

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum++, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.GENE,
            geneStart, geneEnd, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.DOT, geneId, null, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, null, null, null, -1, null, null,
            null,
            geneAnonymousOptionalFields);
    final GencodeGtfGeneFeature gene = (GencodeGtfGeneFeature)GencodeGtfFeature.create(data);
    gene.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum++, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.TRANSCRIPT,
            geneStart, geneEnd, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.DOT, geneId, transcriptId, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, transcriptName, -1, null, null,
            null,
            transcriptAnonymousOptionalFields
    );
    final GencodeGtfTranscriptFeature transcript = (GencodeGtfTranscriptFeature) GencodeGtfFeature.create(data);
    transcript.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum++, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.EXON,
            geneStart, geneEnd, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.DOT, geneId, transcriptId, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, transcriptName, 1, exonId, null,
            null,
            transcriptAnonymousOptionalFields
    );
    final GencodeGtfExonFeature exon = (GencodeGtfExonFeature) GencodeGtfFeature.create(data);
    exon.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum++, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.CDS,
            cdsStart, cdsEnd, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.ZERO, geneId, transcriptId, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, transcriptName, 1, null, null,
            Collections.singletonList(new GencodeGtfFeature.OptionalField<>("protein_id", transcriptId)),
            transcriptAnonymousOptionalFields
    );
    final GencodeGtfCDSFeature cds = (GencodeGtfCDSFeature) GencodeGtfFeature.create(data);
    cds.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum++, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.START_CODON,
            cdsStart, cdsStart+2, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.ZERO, geneId, transcriptId, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, transcriptName, 1, null, null,
            null,
            transcriptAnonymousOptionalFields
    );
    final GencodeGtfStartCodonFeature startCodon = (GencodeGtfStartCodonFeature) GencodeGtfFeature.create(data);
    startCodon.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    data = new GencodeGtfFeatureBaseData(EnsemblGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum, ECOLI_CONTIG_NAME, GencodeGtfFeature.ANNOTATION_SOURCE_ENA, GencodeGtfFeature.FeatureType.STOP_CODON,
            cdsEnd+1, cdsEnd+3, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.ZERO, geneId, transcriptId, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, transcriptName, 1, null, null,
            null,
            transcriptAnonymousOptionalFields
    );
    final GencodeGtfStopCodonFeature stopCodon = (GencodeGtfStopCodonFeature) GencodeGtfFeature.create(data);
    stopCodon.setUcscGenomeVersion(ECOLI_UCSC_GENOME_VERSION);

    // Aggregate the Features as they should be:
    exon.setStartCodon(startCodon);
    exon.setStopCodon(stopCodon);
    exon.setCds(cds);
    transcript.addExon(exon);
    gene.addTranscript(transcript);

    return gene;
}
 
Example 11
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the coding sequence-aligned allele based on stop and start position.
 * @param codingSequence Coding sequence from which the allele should be derived.  Must not be {@code null}.
 * @param alignedAlleleStart Start position of the allele (1-indexed, inclusive).  Must not be {@code null}.  Must be > 0.
 * @param alignedAlleleStop Stop position of the allele (1-indexed, inclusive).  Must not be {@code null}.  Must be > 0.
 * @param strand {@link Strand} on which the allele is coded.  Must not be {@code null}.  Must not be {@link Strand#NONE}.
 * @return The {@link String} representation of the allele.
 */
private static String getAlignedAlleleSequence(final String codingSequence,
                                               final Integer alignedAlleleStart,
                                               final Integer alignedAlleleStop,
                                               final Strand strand) {
    Utils.nonNull(codingSequence);
    Utils.nonNull(alignedAlleleStart);
    ParamUtils.isPositive( alignedAlleleStart, "Genome positions must be > 0." );
    Utils.nonNull(alignedAlleleStop);
    ParamUtils.isPositive( alignedAlleleStop, "Genome positions must be > 0." );
    assertValidStrand( strand );

    // Get our indices:
    // Subtract 1 because we're 1-based.
    int start = alignedAlleleStart - 1;
    int end = alignedAlleleStop;

    final String alignedAlleleSeq;

    if ( strand == Strand.POSITIVE ) {

        if ( end > codingSequence.length() ) {
            throw new TranscriptCodingSequenceException("Gencode transcript ends at position " + end + " but codingSequence is only " + codingSequence.length() + " bases long!");
        }
        else {
            alignedAlleleSeq = codingSequence.substring(start, end);
        }
    }
    else {
        // Negative strand means we need to reverse complement and go from the other end:
        start = codingSequence.length() - alignedAlleleStop;
        end = codingSequence.length() - alignedAlleleStart + 1;

        if ( end > codingSequence.length() ) {
            throw new TranscriptCodingSequenceException("Gencode transcript ends at position " + end + " but codingSequence is only " + codingSequence.length() + " bases long!");
        }
        else {
            alignedAlleleSeq = ReadUtils.getBasesReverseComplement(codingSequence.substring(start, end).getBytes());
        }
    }

    return alignedAlleleSeq;
}
 
Example 12
Source File: FuncotatorUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@DataProvider
Object[][] provideDataForGetStartPositionInTranscript() {

    final List<? extends Locatable> exons_forward = Arrays.asList(
            new SimpleInterval("chr1", 10,19),
            new SimpleInterval("chr1", 30,39),
            new SimpleInterval("chr1", 50,59),
            new SimpleInterval("chr1", 70,79),
            new SimpleInterval("chr1", 90,99)
    );

    final List<? extends Locatable> exons_backward = Arrays.asList(
            new SimpleInterval("chr1", 90,99),
            new SimpleInterval("chr1", 70,79),
            new SimpleInterval("chr1", 50,59),
            new SimpleInterval("chr1", 30,39),
            new SimpleInterval("chr1", 10,19)
    );

    // And a spot-check test from real data:
    final List<? extends Locatable> spot_check_exons = Arrays.asList(
            new SimpleInterval("3", 178916614, 178916965),
            new SimpleInterval("3", 178917478, 178917687),
            new SimpleInterval("3", 178919078, 178919328),
            new SimpleInterval("3", 178921332, 178921577),
            new SimpleInterval("3", 178922291, 178922376),
            new SimpleInterval("3", 178927383, 178927488),
            new SimpleInterval("3", 178927974, 178928126),
            new SimpleInterval("3", 178928219, 178928353),
            new SimpleInterval("3", 178935998, 178936122)
    );

    return new Object[][] {
            { new SimpleInterval("chr1", 1, 1),     exons_forward, Strand.POSITIVE, -1 },
            { new SimpleInterval("chr1", 25, 67),   exons_forward, Strand.POSITIVE, -1 },
            { new SimpleInterval("chr1", 105, 392), exons_forward, Strand.POSITIVE, -1 },
            { new SimpleInterval("chr1", 10, 10),   exons_forward, Strand.POSITIVE,  1 },
            { new SimpleInterval("chr1", 99, 99),   exons_forward, Strand.POSITIVE, 50 },
            { new SimpleInterval("chr1", 50, 67),   exons_forward, Strand.POSITIVE, 21 },
            { new SimpleInterval("chr1", 67, 75),   exons_forward, Strand.POSITIVE, -1 },
            { new SimpleInterval("chr1", 91, 97),   exons_forward, Strand.POSITIVE, 42 },

            { new SimpleInterval("chr1", 1, 1),     exons_backward, Strand.NEGATIVE, -1 },
            { new SimpleInterval("chr1", 25, 67),   exons_backward, Strand.NEGATIVE, -1 },
            { new SimpleInterval("chr1", 105, 392), exons_backward, Strand.NEGATIVE, -1 },
            { new SimpleInterval("chr1", 10, 10),   exons_backward, Strand.NEGATIVE, 50 },
            { new SimpleInterval("chr1", 99, 99),   exons_backward, Strand.NEGATIVE,  1 },
            { new SimpleInterval("chr1", 50, 67),   exons_backward, Strand.NEGATIVE, -1 },
            { new SimpleInterval("chr1", 67, 75),   exons_backward, Strand.NEGATIVE, 15 },

            // Spot check:
            { new SimpleInterval("3", 178936090, 178936096), spot_check_exons, Strand.POSITIVE, 1632 }
    };
}
 
Example 13
Source File: FuncotatorUtils.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Get the position (1-based, inclusive) of the given {@link VariantContext} start relative to the transcript it appears in.
 * The transcript is specified by {@code sortedTranscriptExonList}.
 * @param variant The {@link VariantContext} of which to find the start position in the given transcript (must not be {@code null}).
 * @param exons {@link List} of {@link Locatable}s representing the exons in the transcript in which the given {@code variant} occurs.
 * @param strand The {@link Strand} on which the {@code variant} occurs.
 * @return The start position (1-based, inclusive) of the given {@code variant} in the transcript in which it appears.
 */
public static int getTranscriptAlleleStartPosition(final VariantContext variant,
                                                   final List<? extends Locatable> exons,
                                                   final Strand strand) {
    Utils.nonNull(variant);
    Utils.nonNull(exons);
    assertValidStrand(strand);

    // Set up our position variable:
    int position;

    // NOTE: We don't need to worry about UTRs in here - all UTRs occur somewhere in an exon in GENCODE.

    // Filter the elements by whether they come before the variant in the transcript and
    // then sort them by their order in the transcript:
    final List<Locatable> sortedFilteredExons;
    if ( strand == Strand.POSITIVE) {
        sortedFilteredExons = exons.stream()
                .filter(e -> e.getStart() <= variant.getStart())
                .sorted(Comparator.comparingInt(Locatable::getStart))
                .collect(Collectors.toList());

        // We are guaranteed that the variant occurs in the last element of sortedTranscriptElements because of the sorting.
        // Add 1 to position to account for inclusive values:
        position = variant.getStart() - sortedFilteredExons.get(sortedFilteredExons.size()-1).getStart() + 1;
    }
    else {
        sortedFilteredExons = exons.stream()
                .filter(e -> e.getEnd() >= variant.getStart())
                .sorted(Comparator.comparingInt(Locatable::getStart).reversed())
                .collect(Collectors.toList());

        // We are guaranteed that the variant occurs in the last element of sortedTranscriptElements because of the sorting.
        // Add 1 to position to account for inclusive values:
        position = sortedFilteredExons.get(sortedFilteredExons.size()-1).getEnd() - variant.getStart() + 1;
    }

    // Add up the lengths of all exons before the last one:
    final int numExonsBeforeLast = sortedFilteredExons.size() - 1;
    for ( int i = 0; i < numExonsBeforeLast; ++i ) {
        final Locatable exon = sortedFilteredExons.get(i);

        // Add 1 to position to account for inclusive values:
        position += exon.getEnd() - exon.getStart() + 1;
    }

    return position;
}
 
Example 14
Source File: GencodeFuncotationFactory.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
/**
 * Create a {@link GencodeFuncotation} for a {@code variant} that occurs in a coding region in a given {@code exon}.
 * @param variant The {@link VariantContext} for which to create a {@link GencodeFuncotation}.
 * @param altAllele The {@link Allele} in the given {@code variant} for which to create a {@link GencodeFuncotation}.
 * @param reference The {@link ReferenceContext} for the current data set.
 * @param transcript The {@link GencodeGtfTranscriptFeature} in which the given {@code variant} occurs.
 * @param exon The {@link GencodeGtfExonFeature} in which the given {@code variant} occurs.
 * @return A {@link GencodeFuncotation} containing information about the given {@code variant} given the corresponding {@code exon}.
 */
private GencodeFuncotation createCodingRegionFuncotationForProteinCodingFeature(final VariantContext variant,
                                                                                final Allele altAllele,
                                                                                final ReferenceContext reference,
                                                                                final GencodeGtfTranscriptFeature transcript,
                                                                                final GencodeGtfExonFeature exon) {

    // Get the list of exons by their locations so we can use them to determine our location in the transcript and get
    // the transcript code itself:
    final List<? extends Locatable> exonPositionList = getSortedCdsAndStartStopPositions(transcript);

    // NOTE: Regardless of strandedness, we always report the alleles as if they appeared in the forward direction.
    final GencodeFuncotation.VariantType variantType =
            getVariantType(variant.getReference(),
                    altAllele);

    // Setup the "trivial" fields of the gencodeFuncotation:
    final GencodeFuncotationBuilder gencodeFuncotationBuilder = createGencodeFuncotationBuilderWithTrivialFieldsPopulated(variant, altAllele, transcript);

    // Set the exon number:
    gencodeFuncotationBuilder.setTranscriptExonNumber(exon.getExonNumber());

    // Set our version:
    gencodeFuncotationBuilder.setVersion(version);

    // Set up our SequenceComparison object so we can calculate some useful fields more easily
    // These fields can all be set without knowing the alternate allele:
    final SequenceComparison sequenceComparison = createSequenceComparison(variant, altAllele, reference, transcript, exonPositionList, transcriptIdMap, transcriptFastaReferenceDataSource, true);

    // Set our transcript positions:
    setTranscriptPosition(variant, altAllele, sequenceComparison.getTranscriptAlleleStart(), gencodeFuncotationBuilder);

    // Set the reference context with the bases from the sequence comparison
    // NOTE: The reference context is ALWAYS from the + strand, so we need to reverse our bases back in the - case:
    if ( sequenceComparison.getStrand() == Strand.POSITIVE ) {
        gencodeFuncotationBuilder.setReferenceContext(sequenceComparison.getReferenceBases());
    }
    else {
        gencodeFuncotationBuilder.setReferenceContext(ReadUtils.getBasesReverseComplement(sequenceComparison.getReferenceBases().getBytes()));
    }

    // Set the GC content
    // Set the cDNA change:
    gencodeFuncotationBuilder.setGcContent(sequenceComparison.getGcContent())
                             .setcDnaChange(FuncotatorUtils.getCodingSequenceChangeString(
                                     sequenceComparison.getCodingSequenceAlleleStart(),
                                     sequenceComparison.getReferenceAllele(),
                                     sequenceComparison.getAlternateAllele(),
                                     sequenceComparison.getStrand(),
                                     sequenceComparison.getExonStartPosition(),
                                     sequenceComparison.getExonEndPosition(),
                                     sequenceComparison.getAlleleStart()
                             ));

    //==============================================================================================================
    // Set the Codon and Protein changes and the Variant Classification
    // but only if we have the sequence information to do so.
    // NOTE: This should always be true in this method, but we need to have this if statement just in case it does.
    //       A warning will have been generated in createSequenceComparison if the sequenceComparison does not have
    //       coding sequence information.
    if ( sequenceComparison.hasSequenceInfo() ) {
        final String codonChange = FuncotatorUtils.getCodonChangeString(sequenceComparison, exon.getStartCodon());
        final String proteinChange = FuncotatorUtils.renderProteinChangeString(sequenceComparison, exon.getStartCodon());

        gencodeFuncotationBuilder.setCodonChange(codonChange)
                                 .setProteinChange(proteinChange);

        // Set the Variant Classification:
        final GencodeFuncotation.VariantClassification varClass = createVariantClassification(variant, altAllele, variantType, exon, transcript.getExons().size(), sequenceComparison);
        final GencodeFuncotation.VariantClassification secondaryVarClass;
        gencodeFuncotationBuilder.setVariantClassification(varClass);
        if ( varClass == GencodeFuncotation.VariantClassification.SPLICE_SITE ) {
            secondaryVarClass = getVariantClassificationForCodingRegion(variant, altAllele, variantType, sequenceComparison);
            gencodeFuncotationBuilder.setSecondaryVariantClassification(secondaryVarClass);
        }
    }
    else {
        // Set the variant classification here.
        // We should have sequence information but we don't... this is not good, but we have to put something here:
        gencodeFuncotationBuilder.setVariantClassification( convertGeneTranscriptTypeToVariantClassification(exon.getGeneType()) );
    }

    // Set our data source name:
    gencodeFuncotationBuilder.setDataSourceName(getName());

    return gencodeFuncotationBuilder.build();
}
 
Example 15
Source File: GencodeFuncotationFactoryUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@DataProvider
    Object[][] provideForCreateDefaultFuncotationsOnProblemVariant() {

//        variant, altAllele, gtfFeature, reference, transcript, version

        final SimpleInterval variantInterval =  new SimpleInterval("chr3", 178921515, 178921517);
        final Allele refAllele = Allele.create("GCA", true);
        final Allele altAllele = Allele.create("TTG");
        final VariantContext variant = new VariantContextBuilder(
                FuncotatorReferenceTestUtils.retrieveHg19Chr3Ref(),
                variantInterval.getContig(),
                variantInterval.getStart(),
                variantInterval.getEnd(),
                Arrays.asList(refAllele, altAllele)
        ).make();

        final String versionString = "VERSION";
        final String dataSourceName = "TEST_GENCODE_NAME";

        // ======================
        // Create the GencodeGtfFeature:
        GencodeGtfFeatureBaseData data;

        data = new GencodeGtfFeatureBaseData(GencodeGtfCodec.GTF_FILE_TYPE_STRING, 1, variantInterval.getContig(), GencodeGtfFeature.ANNOTATION_SOURCE_ENSEMBL, GencodeGtfFeature.FeatureType.GENE,
                variantInterval.getStart()-2000, variantInterval.getEnd()+2000, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.DOT, "TEST_GENE1", null, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
                null, "TEST_GENE", null, null, null, -1, null, GencodeGtfFeature.LocusLevel.AUTOMATICALLY_ANNOTATED, null, null);
        final GencodeGtfGeneFeature gene = (GencodeGtfGeneFeature)GencodeGtfFeature.create(data);

        // ======================

        data = new GencodeGtfFeatureBaseData(GencodeGtfCodec.GTF_FILE_TYPE_STRING, 2, variantInterval.getContig(), GencodeGtfFeature.ANNOTATION_SOURCE_ENSEMBL, GencodeGtfFeature.FeatureType.TRANSCRIPT,
                variantInterval.getStart()-1000, variantInterval.getEnd()+1000, Strand.POSITIVE, GencodeGtfFeature.GenomicPhase.DOT, "TEST_GENE1", "TEST_TRANSCRIPT1", GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
                null, "TEST_GENE", GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, "TEST_TRANSCRIPT1", -1, null, GencodeGtfFeature.LocusLevel.AUTOMATICALLY_ANNOTATED,
                Collections.emptyList(),
                null
        );
        final GencodeGtfTranscriptFeature transcript1 = (GencodeGtfTranscriptFeature) GencodeGtfFeature.create(data);
        gene.addTranscript(transcript1);

        // ======================

        return new Object[][] {
                {
                    variant,
                    variant.getAlleles().get(1),
                    gene,
                    new ReferenceContext( refDataSourceHg19Ch3, variantInterval ),
                    gene.getTranscripts().get(0),
                    versionString,
                    dataSourceName,
                    new GencodeFuncotationBuilder()
                            .setDataSourceName(dataSourceName)
                            .setHugoSymbol(gene.getGeneName())
                            .setChromosome(variant.getContig())
                            .setStart(variant.getStart())
                            .setEnd(variant.getEnd())
                            .setVariantClassification(GencodeFuncotation.VariantClassification.COULD_NOT_DETERMINE)
                            .setVariantType(GencodeFuncotation.VariantType.TNP)
                            .setRefAllele(variant.getReference())
                            .setTumorSeqAllele2(variant.getAlternateAllele(0).getBaseString())
                            .setGenomeChange("g.chr3:178921515_178921517GCA>TTG")
                            .setAnnotationTranscript(transcript1.getTranscriptName())
                            .setStrand(gene.getGenomicStrand())
                            .setReferenceContext("TATAAATAGTGCACTCAGAATAA")
                            .setGcContent(0.3399503722084367)
                            .setLocusLevel(Integer.valueOf(gene.getLocusLevel().toString()))
                            // This is OK because there are no exons:
                            .setTranscriptLength(0)
                            .setVersion(versionString)
                            .setGeneTranscriptType(transcript1.getTranscriptType())
                            .setNcbiBuild("TEST")
                            .build()
                },
        };
    }
 
Example 16
Source File: DataProviderForExampleGencodeGtfGene.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
private static GencodeGtfExonFeature createStopCodonExon(final int exonStart, final String contig, final int lengthExons,
                                                         final AtomicInteger featureOrderNum, final String geneName,
                                                         final int exonNum, final int length3pUtr, final Strand codingDirection) {

    final int CODON_LENGTH = 3;

    // Exon is created with room
    final GencodeGtfFeatureBaseData data = new GencodeGtfFeatureBaseData(GencodeGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum.getAndIncrement(), contig, GencodeGtfFeature.ANNOTATION_SOURCE_ENSEMBL, GencodeGtfFeature.FeatureType.EXON,
            exonStart, exonStart + lengthExons + length3pUtr - 1, codingDirection, GencodeGtfFeature.GenomicPhase.DOT, "TEST_GENE1", "TEST_TRANSCRIPT1", GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, "TEST_TRANSCRIPT1", exonNum, "TEST_EXON_" + exonNum, GencodeGtfFeature.LocusLevel.AUTOMATICALLY_ANNOTATED,
            Collections.emptyList(),
            null
    );
    final GencodeGtfExonFeature exon = (GencodeGtfExonFeature) GencodeGtfFeature.create(data);

    final int cdsStart = codingDirection == Strand.POSITIVE ?  exon.getGenomicStartLocation() : exon.getGenomicStartLocation() + length3pUtr + CODON_LENGTH - 1;
    final int cdsEnd = codingDirection == Strand.POSITIVE ?  exon.getGenomicEndLocation() - length3pUtr - CODON_LENGTH : exon.getGenomicEndLocation();

    final GencodeGtfFeatureBaseData tmpCdsMinusStop = new GencodeGtfFeatureBaseData(GencodeGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum.getAndIncrement(), contig, GencodeGtfFeature.ANNOTATION_SOURCE_ENSEMBL, GencodeGtfFeature.FeatureType.CDS,
            cdsStart, cdsEnd, codingDirection, GencodeGtfFeature.GenomicPhase.DOT, "TEST_GENE1", "TEST_TRANSCRIPT1", GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, "TEST_TRANSCRIPT1", exon.getExonNumber(), exon.getExonId(), GencodeGtfFeature.LocusLevel.AUTOMATICALLY_ANNOTATED,
            Collections.emptyList(),
            null
    );
    final GencodeGtfCDSFeature cds1 = (GencodeGtfCDSFeature) GencodeGtfFeature.create(tmpCdsMinusStop);

    final int stopCodonStart = codingDirection == Strand.POSITIVE ? cds1.getGenomicEndLocation() + 1 : cds1.getGenomicStartLocation() - CODON_LENGTH;
    final int stopCodonEnd = codingDirection == Strand.POSITIVE ? cds1.getGenomicEndLocation() + CODON_LENGTH : cds1.getGenomicStartLocation() - 1;

    final GencodeGtfFeatureBaseData tmpStopCodon = new GencodeGtfFeatureBaseData(GencodeGtfCodec.GTF_FILE_TYPE_STRING, featureOrderNum.getAndIncrement(), contig, GencodeGtfFeature.ANNOTATION_SOURCE_ENSEMBL, GencodeGtfFeature.FeatureType.STOP_CODON,
            stopCodonStart, stopCodonEnd, codingDirection, GencodeGtfFeature.GenomicPhase.DOT, "TEST_GENE1", "TEST_TRANSCRIPT1", GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING,
            null, geneName, GencodeGtfFeature.GeneTranscriptType.PROTEIN_CODING, null, "TEST_TRANSCRIPT1", exon.getExonNumber(), exon.getExonId(), GencodeGtfFeature.LocusLevel.AUTOMATICALLY_ANNOTATED,
            Collections.emptyList(),
            null
    );
    final GencodeGtfStopCodonFeature stopCodon1 = (GencodeGtfStopCodonFeature) GencodeGtfFeature.create(tmpStopCodon);

    exon.setCds(cds1);
    exon.setStopCodon(stopCodon1);

    return exon;
}
 
Example 17
Source File: MapBlocks.java    From varsim with BSD 2-Clause "Simplified" License 4 votes vote down vote up
public Collection<ReadMapBlock> liftOverGenomeInterval(final GenomeInterval interval, final int minIntervalLength) {
    final Collection<ReadMapBlock> readMapBlocks = new ArrayList<>();

    final ChrString chromosome = interval.chromosome;
    final int start = interval.start + 1; //convert to 1-based start
    final int end = interval.end; //1-based end

    if (!blockIntervalTree.containsKey(chromosome)) {
        return readMapBlocks;
    }

    final List<MapBlock> subset = findIntersectingBlocks(blockIntervalTree, chromosome, start, end);

    log.trace("Going to lift over " + interval);

    // since intervalOffset is only used for MapBlock, which is 1-based, we need to set the beginning offset to 1
    int intervalOffset = 1;
    for (MapBlock b : subset) {

            int srcStart = Math.max(start, b.srcLoc.location);
            int srcEnd = Math.min(end, b.srcLoc.location + b.size - 1);
            int lengthOfInterval = srcEnd - srcStart + 1;
            final int lengthOfIntervalOnRead = b.blockType != MapBlock.BlockType.DEL ? lengthOfInterval : 0;
            final int lengthOfIntervalOnRef = b.blockType != MapBlock.BlockType.INS ? lengthOfInterval : 0;

            log.trace("intervalStart = " + srcStart + " intervalEnd = " + srcEnd + " lengthOfInterval = " + lengthOfInterval);

            // lengthOfIntervalOnRef is length of the lifted over interval
            if (lengthOfIntervalOnRef < minIntervalLength) {
                log.trace("Skipping block " + b + " since the overlap is too small ( < " + minIntervalLength + ")");
            } else {
                // 0-based start, 1-based end
                final GenomeInterval liftedInterval = new GenomeInterval();
                liftedInterval.chromosome = b.dstLoc.chromosome;
                liftedInterval.feature = b.blockType;
                if (b.direction == 0) {
                    liftedInterval.start = b.dstLoc.location + srcStart - b.srcLoc.location - 1;
                    liftedInterval.end = liftedInterval.start + lengthOfIntervalOnRef;
                    liftedInterval.strand = interval.strand;
                } else {
                    liftedInterval.start = b.dstLoc.location + (b.srcLoc.location + b.size - 1 - srcEnd) - 1;
                    liftedInterval.end = liftedInterval.start + lengthOfIntervalOnRef;
                    liftedInterval.strand = interval.strand == Strand.POSITIVE ? Strand.NEGATIVE : Strand.POSITIVE;
                }

                readMapBlocks.add(new ReadMapBlock(intervalOffset, intervalOffset + lengthOfIntervalOnRead - 1, liftedInterval));
            }
            intervalOffset += lengthOfIntervalOnRead;
    }
    return readMapBlocks;
}
 
Example 18
Source File: FuncotatorUtilsUnitTest.java    From gatk with BSD 3-Clause "New" or "Revised" License 4 votes vote down vote up
@DataProvider
    Object[][] provideDataForGetAlignedCodingSequenceAllele() {

        final String seq = "ATGAAAGGGGTGCCTATGCTAGATAGACAGATAGTGTGTGTGTGTGTGCGCGCGCGCGCGCGTTGTTAG";

        //CTA ACA ACG CGC GCG CGC GCG CAC ACA CAC ACA CAC TAT CTG TCT ATC TAG CAT AGG CAC CCC TTT CAT

//        final Allele refAllele,
//        final Integer refAlleleStart,

        return new Object[][] {
                { seq,  1, 3,  Allele.create("ATG", true), 1, Strand.POSITIVE, "ATG" },
                { seq,  4, 6,  Allele.create("AAA", true), 4, Strand.POSITIVE, "AAA" },
                { seq,  7, 9,  Allele.create("GGG", true), 7, Strand.POSITIVE, "GGG" },
                { seq, 10, 12, Allele.create("GTG", true), 10, Strand.POSITIVE, "GTG" },
                { seq, 13, 15, Allele.create("CCT", true), 13, Strand.POSITIVE, "CCT" },
                { seq, 16, 18, Allele.create("ATG", true), 16, Strand.POSITIVE, "ATG" },
                { seq, 19, 21, Allele.create("CTA", true), 19, Strand.POSITIVE, "CTA" },
                { seq,  1,  6, Allele.create("ATGAAA", true), 1, Strand.POSITIVE, "ATGAAA" },
                { seq,  4,  9, Allele.create("AAAGGG", true), 4, Strand.POSITIVE, "AAAGGG" },
                { seq,  7, 12, Allele.create("GGGGTG", true), 7, Strand.POSITIVE, "GGGGTG" },
                { seq, 10, 15, Allele.create("GTGCCT", true), 10, Strand.POSITIVE, "GTGCCT" },
                { seq, 13, 18, Allele.create("CCTATG", true), 13, Strand.POSITIVE, "CCTATG" },
                { seq, 16, 21, Allele.create("ATGCTA", true), 16, Strand.POSITIVE, "ATGCTA" },
                { seq, 19, 24, Allele.create("CTAGAT", true), 19, Strand.POSITIVE, "CTAGAT" },
                { seq, 1, seq.length(), Allele.create(seq, true), 1, Strand.POSITIVE, seq },

                { seq,  1, 3,  Allele.create("CTA", true), 1, Strand.NEGATIVE, "CTA" },
                { seq,  4, 6,  Allele.create("ACA", true), 4, Strand.NEGATIVE, "ACA" },
                { seq,  7, 9,  Allele.create("ACG", true), 7, Strand.NEGATIVE, "ACG" },
                { seq, 10, 12, Allele.create("CGC", true), 10, Strand.NEGATIVE, "CGC" },
                { seq, 13, 15, Allele.create("GCG", true), 13, Strand.NEGATIVE, "GCG" },
                { seq, 16, 18, Allele.create("CGC", true), 16, Strand.NEGATIVE, "CGC" },
                { seq, 19, 21, Allele.create("GCG", true), 19, Strand.NEGATIVE, "GCG" },
                { seq,  1,  6, Allele.create("CTAACA", true), 1, Strand.NEGATIVE, "CTAACA" },
                { seq,  4,  9, Allele.create("ACAACG", true), 4, Strand.NEGATIVE, "ACAACG" },
                { seq,  7, 12, Allele.create("ACGCGC", true), 7, Strand.NEGATIVE, "ACGCGC" },
                { seq, 10, 15, Allele.create("CGCGCG", true), 10, Strand.NEGATIVE, "CGCGCG" },
                { seq, 13, 18, Allele.create("GCGCGC", true), 13, Strand.NEGATIVE, "GCGCGC" },
                { seq, 16, 21, Allele.create("CGCGCG", true), 16, Strand.NEGATIVE, "CGCGCG" },
                { seq, 19, 24, Allele.create("GCGCAC", true), 19, Strand.NEGATIVE, "GCGCAC" },
                { seq, 1, seq.length(), Allele.create(ReadUtils.getBasesReverseComplement( seq.getBytes() ), true), 1, Strand.NEGATIVE, ReadUtils.getBasesReverseComplement( seq.getBytes() ) },
        };
    }
 
Example 19
Source File: StrandCorrectedAllele.java    From gatk with BSD 3-Clause "New" or "Revised" License 2 votes vote down vote up
/**
 * Create a new {@link StrandCorrectedAllele} object with the given bases.
 * {@code bases} are assumed to be on the {@link Strand#POSITIVE} strand.
 * @param bases {@link String} of bases to use as the allele.
 * @param isRef {@code true} iff the given allele is a reference allele.  {@code false} otherwise.
 * @return A new {@link StrandCorrectedAllele} object containing the given bases.
 */
public static StrandCorrectedAllele create(final String bases, final boolean isRef) {
    return new StrandCorrectedAllele(bases, isRef, Strand.POSITIVE);
}
 
Example 20
Source File: GencodeFuncotationFactory.java    From gatk with BSD 3-Clause "New" or "Revised" License 2 votes vote down vote up
/**
 * Determines whether the provided variant is in the 3' flanking region of the provided transcript. A variant
 * is in the 3' flanking region if it overlaps any of the threePrimeFlankSize bases after the 3' end of the
 * transcript, but does not overlap any part of the transcript itself.
 *
 * @param variant variant to check
 * @param transcript transcript whose flanking regions to consider
 * @param threePrimeFlankSize size in bases of the 3' flanking region
 * @return true if the variant overlaps the 3' flanking region and does not overlap the transcript itself,
 *         otherwise false
 */
@VisibleForTesting
static boolean isThreePrimeFlank(final VariantContext variant, final GencodeGtfTranscriptFeature transcript, final int threePrimeFlankSize) {
    return (transcript.getGenomicStrand() == Strand.POSITIVE && isInTranscriptRightFlank(variant, transcript, threePrimeFlankSize)) ||
           (transcript.getGenomicStrand() == Strand.NEGATIVE && isInTranscriptLeftFlank(variant, transcript, threePrimeFlankSize));
}