Posts

Showing posts with the label Lucene

Sitecore: Extract Indexed Content of Media Files using MediaItemContentExtractor

Here is something in addition to my previous post regarding indexing associated content: Here is a common scenario: Your custom index configuration is set up to crawl all the content for your website which is then used by your site search (keywords search) to fetch search results. In addition to you content item crawlers, you add a crawler for Media Library items as well and Sitecore does a great job of indexing PDF, DOCX, DOC, etc. files automatically, provided your have a valid IFilter installed, and now you have search extended to show file items as search results. Now consider the following scenario: One of the lookup fields on your page points to a file in the media library and the new requirement is to show the page item in the search result when the search phrase matches the content in the associated file. Solution (Lucene & Solr): Create a computed field called "related_content" that stored the crawled content of the associate file and extend the query to now se

RESOLVED: Solr Exceptions - Document contains at least one immense term in field

If you implemented Solr with Sitecore using Solr 5.x, you may run into the following error when indexing extremely large content in string fields: org.apache.solr.common.SolrException: Exception writing document id <xxxuniqueid> to the index; possible analysis error. Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="<xxxfieldname>" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[10, 9, 9, 10, 32, 32, 32, 32, 32, 32, 82, 84, 82, 83, 32, 70, 97, 99, 105, 108, 105, 116, 121, 32, 10, 32, 32, 32, 32, 84]...', original message: bytes can be at most 32766 in length; got <intgreaterthan32766> Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got <intgreaterthan32766> This is due to the fa

Sitecore: Indexing Associated Content

Setting up indexes and indexing content for site search in Sitecore is a pretty straightforward task and there is an extensive knowledge base put together by the community with various examples. One useful construct we often find ourselves implementing for keyword search is  setting up a computed field . I typically use this construct to index any additional content referenced by a page, typically (content blocks, promos, callouts) added to the page via presentation details The Need: Index externally reference content by a page item Solve: Create a computed field to index TextField and HtmlText type fields of referenced items This implementation fetches all the renderings for the current item's presentation for the default device and checks their datasource item for index-able content. As a suggestion, check if the current item inherits from certain page templates else skip the execution. Step1: Create a class and implemented the IComputedIndexField interface as shown below: pu

Exception Info: Lucene.Net.Index.MergePolicy+MergeException

While trying to convert items into buckets, my w3wp process started crashing randomly. Looking at the Event Viewer I saw the following error: Application: w3wp.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Lucene.Net.Index.MergePolicy+MergeException Stack:    at Lucene.Net.Index.ConcurrentMergeScheduler.HandleMergeException(System.Exception)    at Lucene.Net.Index.ConcurrentMergeScheduler+MergeThread.Run()    at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)    at System.Threading.ThreadHelper.ThreadStart() I assumed the issue was r