JSolr provides configuration options for blocking the crawler from certain Joomla! data and works in a similar fashion to the robots.txt file used to exclude web pages from being indexed by a search engine.
Unlike a search engine which works with web directories and pages, a JSolr robots file can be configured to block various types of data, such as sections, categories and articles in the case of com_content, or products for a component such as Virtuemart.
To edit the JSolr robots file, browse to JSolr Index->Robots from the administration's top navigation.
JSolrIndex is bundled with a number of crawler plugins; the current list of components that can be crawled are:
Each of these plugins can be blocked from indexing certain information by using the following commands with each command taking the form:
component_name;rule_name=list_of_values
If list_of_values contains more than one value, separate each value with a comma.
To block an article:
com_content;article=1,2,3,n
To block a category:
com_content;category=1,2,3,n
To block entire sections:
com_content;section=1,2,3,n
To block a newsfeed:
com_newsfeeds;newsfeed=1,2,3,n
To block a category:
com_newsfeeds;category=1,2,3,n
To block a virtuemart product:
com_virtuemart;product=1,2,3,n