private static IBits DocsWithFieldCacheEntry_CreateValue(IndexReader reader, Entry entryKey, bool setDocsWithField /* ignored */) { var field = entryKey.field; FixedBitSet res = null; var terms = new TermsEnumCompatibility(reader, field); var maxDoc = reader.MaxDoc; var term = terms.Next(); if (term != null) { int termsDocCount = terms.GetDocCount(); Debug.Assert(termsDocCount <= maxDoc); if (termsDocCount == maxDoc) { // Fast case: all docs have this field: return(new MatchAllBits(maxDoc)); } while (true) { if (res == null) { // lazy init res = new FixedBitSet(maxDoc); } var termDocs = reader.TermDocs(term); while (termDocs.Next()) { res.Set(termDocs.Doc); } term = terms.Next(); if (term == null) { break; } } } if (res == null) { return(new MatchNoBits(maxDoc)); } int numSet = res.Cardinality(); if (numSet >= maxDoc) { // The cardinality of the BitSet is maxDoc if all documents have a value. Debug.Assert(numSet == maxDoc); return(new MatchAllBits(maxDoc)); } return(res); }
public ShapeFieldCache <T> GetCache(IndexReader reader) { lock (locker) { ShapeFieldCache <T> idx; if (sidx.TryGetValue(reader, out idx) && idx != null) { return(idx); } //long startTime = System.CurrentTimeMillis(); //log.fine("Building Cache [" + reader.MaxDoc() + "]"); idx = new ShapeFieldCache <T>(reader.MaxDoc, defaultSize); var count = 0; var tec = new TermsEnumCompatibility(reader, shapeField); var term = tec.Next(); while (term != null) { var shape = ReadShape(term); if (shape != null) { var docs = reader.TermDocs(new Term(shapeField, tec.Term().Text)); while (docs.Next()) { idx.Add(docs.Doc, shape); count++; } } term = tec.Next(); } sidx.Add(reader, idx); tec.Close(); //long elapsed = System.CurrentTimeMillis() - startTime; //log.fine("Cached: [" + count + " in " + elapsed + "ms] " + idx); return(idx); } }
public override DocIdSet GetDocIdSet(IndexReader reader) { var result = new FixedBitSet(reader.MaxDoc()); var fields = reader.GetFieldNames(IndexReader.FieldOption.ALL); if (fields == null || fields.Count == 0) { return(result); } String lastField = null; TermsEnumCompatibility termsEnum = null; foreach (Term term in terms) { var f = term.Field(); if (!f.Equals(lastField)) { var termsC = new TermsEnumCompatibility(reader, f); if (termsC.Term() == null) { return(result); } termsEnum = termsC; lastField = f; } if (terms != null) { // TODO this check doesn't make sense, decide which variable its supposed to be for Debug.Assert(termsEnum != null); if (termsEnum.SeekCeil(term.Text()) == TermsEnumCompatibility.SeekStatus.FOUND) { termsEnum.Docs(result); } } } return(result); }
public override DocIdSet GetDocIdSet(IndexReader reader) { var result = new FixedBitSet(reader.MaxDoc); var fields = reader.GetFieldNames(IndexReader.FieldOption.ALL); if (fields == null || fields.Count == 0) { return result; } String lastField = null; TermsEnumCompatibility termsEnum = null; foreach (Term term in terms) { if (!term.Field.Equals(lastField)) { var termsC = new TermsEnumCompatibility(reader, term.Field); if (termsC.Term() == null) { return result; } termsEnum = termsC; lastField = term.Field; } if (terms != null) { // TODO this check doesn't make sense, decide which variable its supposed to be for Debug.Assert(termsEnum != null); if (termsEnum.SeekCeil(term.Text) == TermsEnumCompatibility.SeekStatus.FOUND) { termsEnum.Docs(result); } } } return result; }
public override DocIdSet GetDocIdSet(Index.IndexReader reader /*, Bits acceptDocs*/) { var bits = new OpenBitSet(reader.MaxDoc); var terms = new TermsEnumCompatibility(reader, fieldName); var term = terms.Next(); if (term == null) return null; Node scanCell = null; //cells is treated like a stack. LinkedList conveniently has bulk add to beginning. It's in sorted order so that we // always advance forward through the termsEnum index. var cells = new LinkedList<Node>( grid.GetWorldNode().GetSubCells(queryShape)); //This is a recursive algorithm that starts with one or more "big" cells, and then recursively dives down into the // first such cell that intersects with the query shape. It's a depth first traversal because we don't move onto // the next big cell (breadth) until we're completely done considering all smaller cells beneath it. For a given // cell, if it's *within* the query shape then we can conveniently short-circuit the depth traversal and // grab all documents assigned to this cell/term. For an intersection of the cell and query shape, we either // recursively step down another grid level or we decide heuristically (via prefixGridScanLevel) that there aren't // that many points, and so we scan through all terms within this cell (i.e. the term starts with the cell's term), // seeing which ones are within the query shape. while (cells.Count > 0) { Node cell = cells.First.Value; cells.RemoveFirst(); var cellTerm = cell.GetTokenString(); var seekStat = terms.Seek(cellTerm); if (seekStat == TermsEnumCompatibility.SeekStatus.END) break; if (seekStat == TermsEnumCompatibility.SeekStatus.NOT_FOUND) continue; if (cell.GetLevel() == detailLevel || cell.IsLeaf()) { terms.Docs(bits); } else {//any other intersection //If the next indexed term is the leaf marker, then add all of them var nextCellTerm = terms.Next(); Debug.Assert(nextCellTerm.Text.StartsWith(cellTerm)); scanCell = grid.GetNode(nextCellTerm.Text, scanCell); if (scanCell.IsLeaf()) { terms.Docs(bits); term = terms.Next();//move pointer to avoid potential redundant addDocs() below } //Decide whether to continue to divide & conquer, or whether it's time to scan through terms beneath this cell. // Scanning is a performance optimization trade-off. bool scan = cell.GetLevel() >= prefixGridScanLevel;//simple heuristic if (!scan) { //Divide & conquer var lst = cell.GetSubCells(queryShape); for (var i = lst.Count - 1; i >= 0; i--) //add to beginning { cells.AddFirst(lst[i]); } } else { //Scan through all terms within this cell to see if they are within the queryShape. No seek()s. for (var t = terms.Term(); t != null && t.Text.StartsWith(cellTerm); t = terms.Next()) { scanCell = grid.GetNode(t.Text, scanCell); int termLevel = scanCell.GetLevel(); if (termLevel > detailLevel) continue; if (termLevel == detailLevel || scanCell.IsLeaf()) { //TODO should put more thought into implications of box vs point Shape cShape = termLevel == grid.GetMaxLevels() ? scanCell.GetCenter() : scanCell.GetShape(); if (queryShape.Relate(cShape) == SpatialRelation.DISJOINT) continue; terms.Docs(bits); } }//term loop } } }//cell loop return bits; }