Clusterpoint CONTACTS / Technical support Become Technology Partner Download free version
  • Home
  • Advantages
  • Products
  • Services
  • Download
  • Documentation
  • Support
  • Store
  • Partners
  • About
  • Search
  • Tour 1
  • Tour 2
  • Tour 3
  • Tour 4
  • DBMS Features
  • Information Ranking
  • Solutions
  • Index Ranking for Search Relevance
  • Step 1: Ranking XML Data Structure
  • Step 2: Ranking of Text Content
  • Step 3: Ranking of Documents
  • Step 4: Calculation of Relevance
  • Linear Ranking Scalability in Clusters
  • Reduce Multi-page Browsing Queries
  • Information Ranking Performance
  • Configuring Ranking Rules by Policy
  • Benefits for Database Management

Step 1: Ranking XML Data Structure

Clusterpoint Server software provides a mechanism to rank any custom XML / JSON database structure for search relevancy, using Relevancy Weights.

You can assign relative weights in the range of from 0% to 100% (those of you familiar with enterprise search systems, probably already know what that means) to any parts or items of your custom XML / JSON documents (any JSON data object internally is stored as an XML data object with the same hierarchal structure and content, so that customer through Clusterpoint API can freely use both XML or JSON data formats; rankings are applied in a single uniform way based on XML internal representation of any stored data objects). Next we illustrate how ranking works using underlying XML data format.

Here is a sample query and a sample document illustrating how XML structure ranking works:

Enterprise search relevancy with customizable weights and document parts ranking
Unlike SQL queries that generates flat relevancy for all results matching a query, Clusterpoint information ranking technology allows our customers to define very flexible custom ordering rules for their result sets, and apply those rules only for those query terms, that match specific terms in ranked XML document parts. By changing this relative ranking for their XML structure, they can increase or decrease relative importance of a specific XML item, until the resulting rule set will be considered the most relevant by their customers and end-users.

Weights are often used in enterprise search world, mostly to rank query terms. This concept should be known to majority of software developers working with enterprise search tools. We decided to take similar approach and integrate into our XML-only document database weight-based ranking system, providing capacity to rank data beyond relational system. In fact, we decided to rank XML structure itself by relative weights, which makes more sense for XML-only databases. Many enterprise search systems do require rather complex programming of assigning weights to query terms, which must be then done also in application software, and which differs from system to system. We decided to go the opposite way, thinking that a small well-described XML configuration file of ranking rules per entire XML database (storage) - Document Policy - is all that is necessary for ease of programming ranking rules for any database, making the application software code totally independent of search ranking peculiarities. You also eliminate the need to program ranking logic and algorithms into query support software from platform to platform, saving a good deal of overall development time.

To make Relevancy algorithms based on weights easily understandable in Clusterpoint database architecture, we decided to use commonly understandable concept of percentage % as our relative weight values (from 0 to 100%) which you can assign to any XML named fields or areas in your custom XML documents.

For example, a 'Title' tag can be ranked 100%, a 'Notes' tag 10%. Then when searching a query term 'London' those search results having a term "London' hits in the 'Title' tag, will be grouped upfront and considered the most relevant (at 100% relevancy), but similar query hit matches in 'Notes' field - as the next, less-ranked (only at 10% relevancy) group within all search results.

This makes sense, as you need to tell the database server how to order and group search results with respect of each other, and relative relevancy numbers in % elegantly solves this problem. It makes for our customers XML data structure ranking a straightforward exercise. You just need to assign more % to those your XML tags, where you want matching search hits to group documents higher in your search results. Higher relevancy weight for a particular XML data fields tells Clusterpoint Server to rank this XML data item containing documents higher. If search hits match this item, all documents found with search query term hits in this item, will be grouped higher and assigned the relevancy of this XML data item. In our sample document it could look like this:

XML database structure relevance ranking
Another way how to simply describe Relevancy weight value meaning, is that it tells which XML data field in your database is more relevant for you in search than other XML fields.

An XML document is a tree-like data object, and in this way you can assign a single parent field relevancy for all XML child fields, which are sub-fields in XML structure, without explicitly defining weight for each specific sub-field. This parent-child relationship brings weighting to all XML data structure sub-parts, for example, like parts of an address, or parts of persons name and initials, which may be designed or historically structured as multi-part fields in your database structure.

Relevancy weight values for any custom XML data structure items are defined by Clusterpoint database Document Policy configuration file, a small customer configurable XML file, where customer defines all rules for ranking his own XML data structure items among themselves. Each database storage can have totally different Document Policy, which is application specific configuration of that database indexing and search rules, reflected into Clusterpoint Index.

A complete rule-set for a particular customer XML database (a Document Policy file) is used to define custom search ordering rules for your XML database structure. It brings search results listed in the exact data grouping order that a specific application may require from a customer business logic point of view.

Please note that different XML data items (fields) can have the same relevancy in Clusterpoint architecture, which is useful to group search results from multiple different fields into the same group of search results. For example, if 'Title', 'Keywords' and 'Abstract' fields can be assigned the same Relevancy weight, the same search hit in any of them will then produce the same result set. You can start programming advanced search logic without application software changes, just by configuring Document Policy file for your custom XML database.

For example, to treat search hits in all document titles higher than hits in the Main text or a Footer, in the sample above higher relevancy weight value is applied only to the Title field. If all of three sample query terms: w1, w2 or w3 will match content in the Title field, those documents will be grouped upfront (relevancy is 80%). The next group of documents in search results will be having query terms matches in the Main text (relevancy from 20% to 50%), and only then will be grouped documents having search matches in the Footer field (with 0% relevancy in our sample).

Essentially, Clusterpoint database Information Ranking architecture is application software independent. It directly translates into substantial productivity savings in development and application software maintenance.

Clusterpoint XML-only database Document policy configuration file describing information ranking for a particular custom database, is open XML file, which can be used also by any other customer cross-platform applications to "learn" and match database information ranking rules to be used also in different search platforms. There is Clusterpoint API commands to read and modify Document policy file directly from customer application software.


© Clusterpoint Ltd. 2006-2012. All rights reserved
  • Home
  • Privacy Policy
  • Trade Marks
  • Site Map
  • Contacts