Java Code Examples for org.apache.lucene.index.Fields#terms()

The following examples show how to use org.apache.lucene.index.Fields#terms() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: DfsOnlyRequest.java    From Elasticsearch with Apache License 2.0 6 votes vote down vote up
public DfsOnlyRequest(Fields termVectorsFields, String[] indices, String[] types, Set<String> selectedFields) throws IOException {
    super(indices);

    // build a search request with a query of all the terms
    final BoolQueryBuilder boolBuilder = boolQuery();
    for (String fieldName : termVectorsFields) {
        if ((selectedFields != null) && (!selectedFields.contains(fieldName))) {
            continue;
        }
        Terms terms = termVectorsFields.terms(fieldName);
        TermsEnum iterator = terms.iterator();
        while (iterator.next() != null) {
            String text = iterator.term().utf8ToString();
            boolBuilder.should(QueryBuilders.termQuery(fieldName, text));
        }
    }
    // wrap a search request object
    this.searchRequest = new SearchRequest(indices).types(types).source(new SearchSourceBuilder().query(boolBuilder));
}
 
Example 2
Source File: TokenSources.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
/**
 * A convenience method that tries to first get a {@link TokenStreamFromTermVector} for the
 * specified docId, then, falls back to using the passed in
 * {@link org.apache.lucene.document.Document} to retrieve the TokenStream.
 * This is useful when you already have the document, but would prefer to use
 * the vector first.
 *
 * @param reader The {@link org.apache.lucene.index.IndexReader} to use to try
 *        and get the vector from
 * @param docId The docId to retrieve.
 * @param field The field to retrieve on the document
 * @param document The document to fall back on
 * @param analyzer The analyzer to use for creating the TokenStream if the
 *        vector doesn't exist
 * @return The {@link org.apache.lucene.analysis.TokenStream} for the
 *         {@link org.apache.lucene.index.IndexableField} on the
 *         {@link org.apache.lucene.document.Document}
 * @throws IOException if there was an error loading
 */
@Deprecated // maintenance reasons LUCENE-6445
public static TokenStream getAnyTokenStream(IndexReader reader, int docId,
    String field, Document document, Analyzer analyzer) throws IOException {
  TokenStream ts = null;

  Fields vectors = reader.getTermVectors(docId);
  if (vectors != null) {
    Terms vector = vectors.terms(field);
    if (vector != null) {
      ts = getTokenStream(vector);
    }
  }

  // No token info stored so fall back to analyzing raw content
  if (ts == null) {
    ts = getTokenStream(document, field, analyzer);
  }
  return ts;
}
 
Example 3
Source File: TokenSources.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
/**
 * A convenience method that tries a number of approaches to getting a token
 * stream. The cost of finding there are no termVectors in the index is
 * minimal (1000 invocations still registers 0 ms). So this "lazy" (flexible?)
 * approach to coding is probably acceptable
 * 
 * @return null if field not stored correctly
 * @throws IOException If there is a low-level I/O error
 */
@Deprecated // maintenance reasons LUCENE-6445
public static TokenStream getAnyTokenStream(IndexReader reader, int docId,
    String field, Analyzer analyzer) throws IOException {
  TokenStream ts = null;

  Fields vectors = reader.getTermVectors(docId);
  if (vectors != null) {
    Terms vector = vectors.terms(field);
    if (vector != null) {
      ts = getTokenStream(vector);
    }
  }

  // No token info stored so fall back to analyzing raw content
  if (ts == null) {
    ts = getTokenStream(reader, docId, field, analyzer);
  }
  return ts;
}
 
Example 4
Source File: TokenSources.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
/**
 * Returns a {@link TokenStream} with positions and offsets constructed from
 * field termvectors.  If the field has no termvectors or offsets
 * are not included in the termvector, return null.  See {@link #getTokenStream(org.apache.lucene.index.Terms)}
 * for an explanation of what happens when positions aren't present.
 *
 * @param reader the {@link IndexReader} to retrieve term vectors from
 * @param docId the document to retrieve termvectors for
 * @param field the field to retrieve termvectors for
 * @return a {@link TokenStream}, or null if offsets are not available
 * @throws IOException If there is a low-level I/O error
 *
 * @see #getTokenStream(org.apache.lucene.index.Terms)
 */
@Deprecated // maintenance reasons LUCENE-6445
public static TokenStream getTokenStreamWithOffsets(IndexReader reader, int docId,
                                                    String field) throws IOException {

  Fields vectors = reader.getTermVectors(docId);
  if (vectors == null) {
    return null;
  }

  Terms vector = vectors.terms(field);
  if (vector == null) {
    return null;
  }

  if (!vector.hasOffsets()) {
    return null;
  }
  
  return getTokenStream(vector);
}
 
Example 5
Source File: BlockTermsWriter.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
@Override
public void write(Fields fields, NormsProducer norms) throws IOException {

  for(String field : fields) {

    Terms terms = fields.terms(field);
    if (terms == null) {
      continue;
    }

    TermsEnum termsEnum = terms.iterator();

    TermsWriter termsWriter = addField(fieldInfos.fieldInfo(field));

    while (true) {
      BytesRef term = termsEnum.next();
      if (term == null) {
        break;
      }

      termsWriter.write(term, termsEnum, norms);
    }

    termsWriter.finish();
  }
}
 
Example 6
Source File: UniformSplitTermsWriter.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
@Override
public void write(Fields fields, NormsProducer normsProducer) throws IOException {
  BlockWriter blockWriter = new BlockWriter(blockOutput, targetNumBlockLines, deltaNumLines, blockEncoder);
  ByteBuffersDataOutput fieldsOutput = new ByteBuffersDataOutput();
  int fieldsNumber = 0;
  for (String field : fields) {
    Terms terms = fields.terms(field);
    if (terms != null) {
      TermsEnum termsEnum = terms.iterator();
      FieldInfo fieldInfo = fieldInfos.fieldInfo(field);
      fieldsNumber += writeFieldTerms(blockWriter, fieldsOutput, termsEnum, fieldInfo, normsProducer);
    }
  }
  writeFieldsMetadata(fieldsNumber, fieldsOutput);
  CodecUtil.writeFooter(dictionaryOutput);
}
 
Example 7
Source File: MutatableAction.java    From incubator-retired-blur with Apache License 2.0 6 votes vote down vote up
private IterableRow getIterableRow(String rowId, IndexSearcherCloseable searcher) throws IOException {
  IndexReader indexReader = searcher.getIndexReader();
  BytesRef rowIdRef = new BytesRef(rowId);
  List<AtomicReaderTermsEnum> possibleRowIds = new ArrayList<AtomicReaderTermsEnum>();
  for (AtomicReaderContext atomicReaderContext : indexReader.leaves()) {
    AtomicReader atomicReader = atomicReaderContext.reader();
    Fields fields = atomicReader.fields();
    if (fields == null) {
      continue;
    }
    Terms terms = fields.terms(BlurConstants.ROW_ID);
    if (terms == null) {
      continue;
    }
    TermsEnum termsEnum = terms.iterator(null);
    if (!termsEnum.seekExact(rowIdRef, true)) {
      continue;
    }
    // need atomic read as well...
    possibleRowIds.add(new AtomicReaderTermsEnum(atomicReader, termsEnum));
  }
  if (possibleRowIds.isEmpty()) {
    return null;
  }
  return new IterableRow(rowId, getRecords(possibleRowIds));
}
 
Example 8
Source File: IndexSizeEstimator.java    From lucene-solr with Apache License 2.0 6 votes vote down vote up
private void estimateTermVectors(Map<String, Object> result) throws IOException {
  log.info("- estimating term vectors...");
  Map<String, Map<String, Object>> stats = new HashMap<>();
  for (LeafReaderContext leafReaderContext : reader.leaves()) {
    LeafReader leafReader = leafReaderContext.reader();
    Bits liveDocs = leafReader.getLiveDocs();
    for (int docId = 0; docId < leafReader.maxDoc(); docId += samplingStep) {
      if (liveDocs != null && !liveDocs.get(docId)) {
        continue;
      }
      Fields termVectors = leafReader.getTermVectors(docId);
      if (termVectors == null) {
        continue;
      }
      for (String field : termVectors) {
        Terms terms = termVectors.terms(field);
        if (terms == null) {
          continue;
        }
        estimateTermStats(field, terms, stats, true);
      }
    }
  }
  result.put(TERM_VECTORS, stats);
}
 
Example 9
Source File: SecureAtomicReaderTestBase.java    From incubator-retired-blur with Apache License 2.0 5 votes vote down vote up
private int getTermWithSeekCount(Fields fields, String field) throws IOException {
  Terms terms = fields.terms(field);
  TermsEnum termsEnum = terms.iterator(null);
  SeekStatus seekStatus = termsEnum.seekCeil(new BytesRef(""));
  if (seekStatus == SeekStatus.END) {
    return 0;
  }
  System.out.println(termsEnum.term().utf8ToString());
  int count = 1;
  while (termsEnum.next() != null) {
    count++;
  }
  return count;
}
 
Example 10
Source File: DocumentVisibilityFilter.java    From incubator-retired-blur with Apache License 2.0 5 votes vote down vote up
@Override
public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws IOException {
  AtomicReader reader = context.reader();
  List<DocIdSet> list = new ArrayList<DocIdSet>();

  Fields fields = reader.fields();
  Terms terms = fields.terms(_fieldName);
  if (terms == null) {
    // if field is not present then show nothing.
    return DocIdSet.EMPTY_DOCIDSET;
  }
  TermsEnum iterator = terms.iterator(null);
  BytesRef bytesRef;
  DocumentVisibilityEvaluator visibilityEvaluator = new DocumentVisibilityEvaluator(_authorizations);
  while ((bytesRef = iterator.next()) != null) {
    if (isVisible(visibilityEvaluator, bytesRef)) {
      DocIdSet docIdSet = _filterCacheStrategy.getDocIdSet(_fieldName, bytesRef, reader);
      if (docIdSet != null) {
        list.add(docIdSet);
      } else {
        // Do not use acceptDocs because we want the acl cache to be version
        // agnostic.
        DocsEnum docsEnum = iterator.docs(null, null);
        list.add(buildCache(reader, docsEnum, bytesRef));
      }
    }
  }
  return getLogicalOr(list);
}
 
Example 11
Source File: BlurUtil.java    From incubator-retired-blur with Apache License 2.0 5 votes vote down vote up
private static void applyFamily(OpenBitSet bits, String family, AtomicReader atomicReader, int primeDocRowId,
    int numberOfDocsInRow, Bits liveDocs) throws IOException {
  Fields fields = atomicReader.fields();
  Terms terms = fields.terms(BlurConstants.FAMILY);
  TermsEnum iterator = terms.iterator(null);
  BytesRef text = new BytesRef(family);
  int lastDocId = primeDocRowId + numberOfDocsInRow;
  if (iterator.seekExact(text, true)) {
    DocsEnum docs = iterator.docs(liveDocs, null, DocsEnum.FLAG_NONE);
    int doc = primeDocRowId;
    while ((doc = docs.advance(doc)) < lastDocId) {
      bits.set(doc - primeDocRowId);
    }
  }
}
 
Example 12
Source File: BaseReadMaskFieldTypeDefinitionTest.java    From incubator-retired-blur with Apache License 2.0 5 votes vote down vote up
private void checkTerms(IndexSearcher searcher, String fieldName) throws IOException {
  IndexReader reader = searcher.getIndexReader();
  for (AtomicReaderContext context : reader.leaves()) {
    AtomicReader atomicReader = context.reader();
    Fields fields = atomicReader.fields();
    Terms terms = fields.terms(fieldName);
    TermsEnum iterator = terms.iterator(null);
    BytesRef bytesRef = iterator.next();
    if (bytesRef != null) {
      System.out.println(bytesRef.utf8ToString());
      fail("There are only restricted terms for this field [" + fieldName + "]");
    }
  }
}
 
Example 13
Source File: TermFreq.java    From SourcererCC with GNU General Public License v3.0 5 votes vote down vote up
private void dummy() throws IOException {
    Fields fields = MultiFields.getFields(this.reader);
    Terms terms = fields.terms("field");
    TermsEnum iterator = terms.iterator(null);
    BytesRef byteRef = null;
    while ((byteRef = iterator.next()) != null) {
        String term = new String(byteRef.bytes, byteRef.offset,
                byteRef.length);
        Term termInstance = new Term("tokens", term);
        long termFreq = this.reader.totalTermFreq(termInstance);
        this.TermFreqMap.put(term, termFreq);
        System.out.println(termFreq);
    }
}
 
Example 14
Source File: SecureAtomicReaderTestBase.java    From incubator-retired-blur with Apache License 2.0 5 votes vote down vote up
private int getTermCount(Fields fields, String field) throws IOException {
  Terms terms = fields.terms(field);
  TermsEnum termsEnum = terms.iterator(null);
  int count = 0;
  while (termsEnum.next() != null) {
    count++;
  }
  return count;
}
 
Example 15
Source File: CompletionFieldsConsumer.java    From lucene-solr with Apache License 2.0 5 votes vote down vote up
@Override
public void write(Fields fields, NormsProducer norms) throws IOException {
  delegateFieldsConsumer.write(fields, norms);

  for (String field : fields) {
    CompletionTermWriter termWriter = new CompletionTermWriter();
    Terms terms = fields.terms(field);
    if (terms == null) {
      // this can happen from ghost fields, where the incoming Fields iterator claims a field exists but it does not
      continue;
    }
    TermsEnum termsEnum = terms.iterator();

    // write terms
    BytesRef term;
    while ((term = termsEnum.next()) != null) {
      termWriter.write(term, termsEnum);
    }

    // store lookup, if needed
    long filePointer = dictOut.getFilePointer();
    if (termWriter.finish(dictOut)) {
      seenFields.put(field, new CompletionMetaData(filePointer,
          termWriter.minWeight,
          termWriter.maxWeight,
          termWriter.type));
    }
  }
}
 
Example 16
Source File: MoreLikeThis.java    From lucene-solr with Apache License 2.0 5 votes vote down vote up
/**
 * Find words for a more-like-this query former.
 *
 * @param docNum the id of the lucene document from which to find terms
 */
private PriorityQueue<ScoreTerm> retrieveTerms(int docNum) throws IOException {
  Map<String, Map<String, Int>> field2termFreqMap = new HashMap<>();
  for (String fieldName : fieldNames) {
    final Fields vectors = ir.getTermVectors(docNum);
    final Terms vector;
    if (vectors != null) {
      vector = vectors.terms(fieldName);
    } else {
      vector = null;
    }

    // field does not store term vector info
    if (vector == null) {
      Document d = ir.document(docNum);
      IndexableField[] fields = d.getFields(fieldName);
      for (IndexableField field : fields) {
        final String stringValue = field.stringValue();
        if (stringValue != null) {
          addTermFrequencies(new StringReader(stringValue), field2termFreqMap, fieldName);
        }
      }
    } else {
      addTermFrequencies(field2termFreqMap, vector, fieldName);
    }
  }

  return createQueue(field2termFreqMap);
}
 
Example 17
Source File: STUniformSplitTermsWriter.java    From lucene-solr with Apache License 2.0 5 votes vote down vote up
private TermIteratorQueue<FieldTerms> createFieldTermsQueue(Fields fields, List<FieldMetadata> fieldMetadataList) throws IOException {
  TermIteratorQueue<FieldTerms> fieldQueue = new TermIteratorQueue<>(fieldMetadataList.size());
  for (FieldMetadata fieldMetadata : fieldMetadataList) {
    Terms terms = fields.terms(fieldMetadata.getFieldInfo().name);
    if (terms != null) {
      FieldTerms fieldTerms = new FieldTerms(fieldMetadata, terms.iterator());
      if (fieldTerms.nextTerm()) {
        // There is at least one term for the field.
        fieldQueue.add(fieldTerms);
      }
    }
  }
  return fieldQueue;
}
 
Example 18
Source File: FSTTermsWriter.java    From lucene-solr with Apache License 2.0 5 votes vote down vote up
@Override
public void write(Fields fields, NormsProducer norms) throws IOException {
  for(String field : fields) {
    Terms terms = fields.terms(field);
    if (terms == null) {
      continue;
    }
    FieldInfo fieldInfo = fieldInfos.fieldInfo(field);
    boolean hasFreq = fieldInfo.getIndexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) >= 0;
    TermsEnum termsEnum = terms.iterator();
    TermsWriter termsWriter = new TermsWriter(fieldInfo);

    long sumTotalTermFreq = 0;
    long sumDocFreq = 0;
    FixedBitSet docsSeen = new FixedBitSet(maxDoc);

    while (true) {
      BytesRef term = termsEnum.next();
      if (term == null) {
        break;
      }
          
      BlockTermState termState = postingsWriter.writeTerm(term, termsEnum, docsSeen, norms);
      if (termState != null) {
        termsWriter.finishTerm(term, termState);
        sumTotalTermFreq += termState.totalTermFreq;
        sumDocFreq += termState.docFreq;
      }
    }

    termsWriter.finish(hasFreq ? sumTotalTermFreq : -1, sumDocFreq, docsSeen.cardinality());
  }
}
 
Example 19
Source File: SecureAtomicReader.java    From incubator-retired-blur with Apache License 2.0 4 votes vote down vote up
private Terms getReadMaskTerms(Fields in, String field) throws IOException {
  return in.terms(field + FilterAccessControlFactory.READ_MASK_SUFFIX);
}
 
Example 20
Source File: TokenSources.java    From lucene-solr with Apache License 2.0 3 votes vote down vote up
/**
 * Get a token stream by un-inverting the term vector. This method returns null if {@code tvFields} is null
 * or if the field has no term vector, or if the term vector doesn't have offsets.  Positions are recommended on the
 * term vector but it isn't strictly required.
 *
 * @param field The field to get term vectors from.
 * @param tvFields from {@link IndexReader#getTermVectors(int)}. Possibly null. For performance, this instance should
 *                 be re-used for the same document (e.g. when highlighting multiple fields).
 * @param maxStartOffset Terms with a startOffset greater than this aren't returned.  Use -1 for no limit.
 *                       Suggest using {@link Highlighter#getMaxDocCharsToAnalyze()} - 1
 * @return a token stream from term vectors. Null if no term vectors with the right options.
 */
public static TokenStream getTermVectorTokenStreamOrNull(String field, Fields tvFields, int maxStartOffset)
    throws IOException {
  if (tvFields == null) {
    return null;
  }
  final Terms tvTerms = tvFields.terms(field);
  if (tvTerms == null || !tvTerms.hasOffsets()) {
    return null;
  }
  return new TokenStreamFromTermVector(tvTerms, maxStartOffset);
}