Java Code Examples for org.deeplearning4j.models.word2vec.wordstore.VocabCache#elementAtIndex()

The following examples show how to use org.deeplearning4j.models.word2vec.wordstore.VocabCache#elementAtIndex() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: WordVectorSerializer.java    From deeplearning4j with Apache License 2.0 6 votes vote down vote up
/**
 * This method saves specified SequenceVectors model to target  OutputStream
 *
 * @param vectors SequenceVectors model
 * @param factory SequenceElementFactory implementation for your objects
 * @param stream  Target output stream
 * @param <T>
 */
public static <T extends SequenceElement> void writeSequenceVectors(@NonNull SequenceVectors<T> vectors,
                                                                    @NonNull SequenceElementFactory<T> factory, @NonNull OutputStream stream) throws IOException {
    WeightLookupTable<T> lookupTable = vectors.getLookupTable();
    VocabCache<T> vocabCache = vectors.getVocab();

    try (PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(stream, StandardCharsets.UTF_8)))) {

        // at first line we save VectorsConfiguration
        writer.write(vectors.getConfiguration().toEncodedJson());

        // now we have elements one by one
        for (int x = 0; x < vocabCache.numWords(); x++) {
            T element = vocabCache.elementAtIndex(x);
            String json = factory.serialize(element);
            INDArray d = Nd4j.create(1);
            double[] vector = lookupTable.vector(element.getLabel()).dup().data().asDouble();
            ElementPair pair = new ElementPair(json, vector);
            writer.println(pair.toEncodedJson());
            writer.flush();
        }
    }
}
 
Example 2
Source File: VocabHolder.java    From deeplearning4j with Apache License 2.0 6 votes vote down vote up
public INDArray getSyn0Vector(Integer wordIndex, VocabCache<VocabWord> vocabCache) {
    if (!workers.contains(Thread.currentThread().getId()))
        workers.add(Thread.currentThread().getId());

    VocabWord word = vocabCache.elementAtIndex(wordIndex);

    if (!indexSyn0VecMap.containsKey(word)) {
        synchronized (this) {
            if (!indexSyn0VecMap.containsKey(word)) {
                indexSyn0VecMap.put(word, getRandomSyn0Vec(vectorLength.get(), wordIndex));
            }
        }
    }

    return indexSyn0VecMap.get(word);
}
 
Example 3
Source File: WordVectorSerializer.java    From deeplearning4j with Apache License 2.0 5 votes vote down vote up
/**
 * This method saves vocab cache to provided OutputStream.
 * Please note: it saves only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers
 *
 * @param vocabCache
 * @param stream
 * @throws UnsupportedEncodingException
 */
public static void writeVocabCache(@NonNull VocabCache<VocabWord> vocabCache, @NonNull OutputStream stream)
        throws IOException {
    try (PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(stream, StandardCharsets.UTF_8)))) {
        // saving general vocab information
        writer.println("" + vocabCache.numWords() + " " + vocabCache.totalNumberOfDocs() + " " + vocabCache.totalWordOccurrences());

        for (int x = 0; x < vocabCache.numWords(); x++) {
            VocabWord word = vocabCache.elementAtIndex(x);
            writer.println(word.toJSON());
        }
    }
}