Help and How-tos JSolr Blocking Items From Being Indexed

Blocking Items From Being Indexed

JSolr provides configuration options for blocking the crawler from certain Joomla! data and works in a similar fashion to the robots.txt file used to exclude web pages from being indexed by a search engine.

Unlike a search engine which works with web directories and pages, a JSolr robots file can be configured to block various types of data, such as sections, categories and articles in the case of com_content, or products for a component such as Virtuemart.

To edit the JSolr robots file, browse to JSolr Index->Robots from the administration's top navigation.

JSolrIndex is bundled with a number of crawler plugins; the current list of components that can be crawled are:

  • com_content
  • com_newsfeeds
  • com_virtuemart

Each of these plugins can be blocked from indexing certain information by using the following commands with each command taking the form:

component_name;rule_name=list_of_values

If list_of_values contains more than one value, separate each value with a comma.

com_content

To block an article:

com_content;article=1,2,3,n

To block a category:

com_content;category=1,2,3,n

To block entire sections:

com_content;section=1,2,3,n

com_newsfeeds

To block a newsfeed:

com_newsfeeds;newsfeed=1,2,3,n

To block a category:

com_newsfeeds;category=1,2,3,n

com_virtuemart

To block a virtuemart product:

com_virtuemart;product=1,2,3,n