- Character or text – indexable at field level (to support query), at full text level (to support search) or at both levels (for search and query)
- Numbers – indexable as integers or floats
- Times and Dates
The policy file is also used to identify fields for faceted navigation and uses an inheritance model to combine simplicity with maximum flexibility.
Clusterpoint’s ranking index is used to maximise relevance in search and query results. Rankings are stored in the index, are dynamic and are calculated from your definitions of relative importance. The two key ranking controls are:
- Tag weighting – used to differentiate between fields (or tags) based on relative importance or significance of a particular field
- Document rating – used to identify documents or document groups with a relatively higher level of importance (or provenance) than others
The key to controlling ranking in Clusterpoint is taking a few soft decisions about what content is likely to be most important to your application or to your users. It's not hard wired, can be changed later and can even be overridden at run time. However, the more guidance that's given in the policy file the better the downstream performance.
Click here to enlarge
Clusterpoint’s tag weights are specified in the index policy file. Hybrid indexation uses these values to calculate the weights for each word in an indexed field based on density. These are then combined to calculate and store the final weight for each word at the field and document level. For example, tag weights would be used to tell Clusterpoint that records with the search term in the title of a document are likely to be more relevant to you than records with the search term in the body i.e. title beats body.
Rating is specified in the index policy file by identifying a numeric field in the incoming document to use as input to the rating process. Rating can also be specified in the document itself using an embedded keyword. For example, document rating would be used to tell Clusterpoint that records coming from Bloomberg should always be considered more relevant than anything coming from some guy's personal blog on finance. Document ratings are used as the arbiter when two documents drop out with same tag weightings; kind of like the Chairman's casting vote.
The policy file also
provides options for further control at the field level including:
- Ordering - pre-orders fields in the index to speed
searches and queries requesting the first N results
- Hiding, Highlighting and Snippeting – controls presentation
of field contents in your search results
- Stemming language – sets the stemming language
for searches on words and related words
Changes in indexation policy can be applied quickly and efficiently via the policy file without any need to make changes to your application code. Behaviours defined in the policy file can be overridden in search and query requests at run-time for maximum flexibility.