Clusterpoint CONTACTS / Technical support Become Technology Partner Download free version
  • Home
  • Advantages
  • Products
  • Services
  • Download
  • Documentation
  • Support
  • Store
  • Partners
  • About
  • Search
  • Tour 1
  • Tour 2
  • Tour 3
  • Tour 4
  • DBMS Features
  • Information Ranking
  • Solutions
  • Index Ranking for Search Relevance
  • Step 1: Ranking XML Data Structure
  • Step 2: Ranking of Text Content
  • Step 3: Ranking of Documents
  • Step 4: Calculation of Relevance
  • Linear Ranking Scalability in Clusters
  • Reduce Multi-page Browsing Queries
  • Information Ranking Performance
  • Configuring Ranking Rules by Policy
  • Benefits for Database Management

Step 3: Ranking of Documents

Clusterpoint Server software provides another crucial mechanism for overall information ranking: Document Rate attribute for each data object (each XML document stored into a Clusterpoint database). Document rate is enabling customers to design their own document ranking algorithms. Please see some examples below:

Document position ranking with your own custom programmable information ranking algorithm: use any formula to achieve the most relevant search results
Technically Document Rate is again absolutely simple concept in Clusterpoint DBMS architecture: it is just a custom integer value. It can be added to the particular customer database as an XML tag <rate>integer_value</rate> for each database document. Or alternatively, customer can define through Document policy configuration file his own named tag in his XML document structure, which will be used as a tag with document ranking values by database server software. Clusterpoint Server will use this value to create Clusterpoint Index, by pre-ordering during indexing phase all the references for database documents in such a way, that during fast search customer XML documents with higher Document rate values will be ordered upfront, with all other rankings being equal.

Please note, that this document ranking among themselves is very application specific, for each particular database you may need to design and implement totally different and custom document ranking algorithm. For example, for a database of scientific publications document ranking could be a number of citations for a particular publication. For a database of news articles documents can be ranked by a simple time stamp (most users want news to be ordered sequentially by date and time anyway). For a database of search advertising, where each customer ad is programmed to match thousands of potential keywords or phrases, document ranking can be the pay-per-click price, that the advertisement services customer decides to pay to have a higher position among other advertisements. This is illustrated in the picture above.

There can be 232 different Document rate values and these can be calculated by our customer's own algorithm for each Clusterpoint database. It can be an openly published algorithm, or it can be some secrete "formula" reflecting our customer's confidential business rules, model of doing business, or competitive thinking.

There can be more than 4 billions uniquely ranked documents in Clusterpoint database architecture per single database, within the same Relevancy Weight group. Altogether this delivers capacity to uniquely rank more than 400 billions database items for instant search results positioning and ordering in web applications, without performance loss characteristic to legacy SQL databases.

Whenever Relevancy Weights are equal between any two groups of documents found at search, the Document Rate determines the position of the particular document in that relevancy group within search results. If Document rate is calculated and assigned by some customer application, it is theoretically possible to custom sort information within each group of relevant results in about 4 billion possible combinations. Customer can develop its own very "intelligent" algorithms for Document rate calculation, enjoying virtually unlimited combinations to group and order (organize) the database information at search.

For example, in our sample query 'London', if the database is an Internet index, we can see how it works. Document rate can be calculated and assigned to data objects as an XML tag containing number of links pointing to that particular web page with text 'London'. Then Clusterpoint Server software will return all the documents having search query matches 'London' in the Title tag grouped upwards (Relevancy Weight 100%), and sorted according to their Document rates within this document group. In this way, the very first results of search in such a database will be meaningful and useful for end-users: most popular Web sites about London city, tourism in London, London culture etc. will be on the first results page. Because they are most linked and number of links normally determines popularity of a particular web page on the Internet, your very simple search query 'London' in the Internet index database quickly provides the expected answers.

As a result of this document ranking capacity, our customers can instantly access the most relevant data from any databases, ordered by application specific document rankings, which you can custom design on rules level based on your own information ranking preferences. When implemented, any search query against those databases, containing many billions of XML documents or requiring complex information ranking and ordering rules, will be answered by Clusterpoint Server in short sub-second time frame. This capacity is totally absent in legacy SQL based solutions, as relational system database architecture has never been designed for it. With Clusterpoint database platform you can start building instantly responsive web applications, which outperform legacy SQL databases at search by several orders of magnitude, whenever ranking based search functionality is used.


© Clusterpoint Ltd. 2006-2012. All rights reserved
  • Home
  • Privacy Policy
  • Trade Marks
  • Site Map
  • Contacts