Select .. Order By


Clusterpoint allows to order matched documents using an arbitrary JavaScript expression passed into an ORDER BY clause. The expression is evaluated in the context of each document and return any numeric, string or date value. By default ORDER BY returns results in descending order.

As an example results can be ordered by single field:

SELECT * FROM product
ORDER BY price

Or by an arbitrary expression, like combining value of the price field with a length of name string field:

SELECT * FROM product
ORDER BY Math.sqrt(price) / (1.0 + name.split(" ").length)

Also if the computation becomes too complex to conveniently inline into WHERE clause, the function can be defined at the beginning of the JS/SQL statement and referenced in any of the clauses:

function discount(category) {
discounts = {
"Bicycles" : 0.10,
"Bike": 0.25}
if (category in discounts) {
return discounts[category]
}
return 0;
}
SELECT category, price, discount(category), price * discount(category) FROM product
ORDER BY price * discount(category)

Relevance Ranking

In case you used text query as part of your matching criteria, it is natural to use relevance ranking. In this case Clusterpoint orders results based on how relevant document is in respect to specific text query.

In this case RELEVANCE() function can be simply used in the ORDER BY clause:

SELECT * FROM product
WHERE CONTAINS("Mountain Bike")
ORDER BY RELEVANCE() DESC

Relevance is a measure of the accuracy of the search result, which is calculated according to:

  1. the weight interval of the document part in which the search term appears
  2. the number of times the search term appears in the document
  3. the distance between the search terms in the document, if multiple words are being searched for

The weight interval can be customized using Data Model to best reflect your document structure. A document part with a higher weight interval than other document parts mean that this part is considered more important than other parts. For example, the document's title is more important than the document's text.

Note: fields you use for RELEVANCE() should have "Full Text" index in Data Model. By default Relevance Ranking is set to "1 - 100", you can change Relevance values in Data Model for each field separately.

If you want to get Relevance value for each document, simply use RELEVANCE() in SELECT query:

SELECT *, RELEVANCE() 
FROM product
WHERE CONTAINS("Mountain Bike")
ORDER BY RELEVANCE() DESC


Relevance calculation algorithm

The Clusterpoint Server relevance calculation algorithm consists of two parts:

  1. calculating the weights of individual words (performed when storing documents to the Collection)
  2. calculating the relevance of the document (performed when searching for documents in the Collection)

Calculating the weight for each word in a document

In each document part, the weight of each word is calculated according to the weight interval of the document part the word occurs in.

The weight for a word in a document part is the minimum value of the following:

  • minimum value of the weight interval of the document part plus the number of times the word occurs in the document part
  • maximum value of the weight interval of the document part

The weight interval minimum and maximum can be the same value. In that case, for all words in such document part, no matter how often they appear, the weight in the document part is the same: the weight of the document part.

The maximum value of the weights of a word in all document parts is then assigned as the weight of the word in the document.

Calculating the relevance of a document

When searching documents in the Collection, the relevance of the document according to the search request is calculated as follows:

  • The weights of all search terms in a document are summed.
  • Relevance is calculated by multiplying the total weight with a value that represents the distance between the search terms in the document: the greater the distance, the smaller this value.

Example:

Document structure

{
"_id":"document id",
"Heading": "w1 w1",
"Description": "w1 w2 w2 w2 w3",
"Note": "w1 w1 w1 w1 w2 w3 w3"
}
A document consists of three document parts: Heading, Description, and Note. Each document part contains words w1, w2, and w3 and has its own weight interval, as described in a picture below.

Collection Data Model example

The following weights for mentioned fields are defined in Data Model:

Document part Minimum Maximum
Heading   80
Description 20 50
Note 10 12


To ensure more precise relevance calculation, the weight values can be assigned to different document parts when creating a custom document policy.

Calculating weight for each document

First, the weights of words are calculated in each part of the document:

  • w1(heading)=min(80+2,80)=80,
    w1(description)=min(20+1,50)=21,
    w1(note)=min(10+4,12)=12
  • w2(heading)=0,
    w2(description)=min(20+3,50)=23,
    w2(note)=min(10+1,12)=11
  • w3(heading)=0,
    w3(description)=min(20+1,50)=21,
    w3(note)=min(10+2,12)=12

Then, the weights of words in the entire document are calculated:

  • w1d=max(w1(heading),w1(description),w1(note))=max(80,21,12)=80
  • w2d=max(w2(heading),w2(description),w2(note))=max(0,23,11)=23
  • w3d=max(w3(heading),w3(description),w3(note))=max(0,21,12)=21

Finally, the relevance of the document is calculated:

wtotal=w1d + w2d + w3d = 80+23+21 = 124

Relevance = wtotal * d

Calculation for document relevance according to weight:

Calculating weight for each document


Distance in Geographic Coordinates

You can order results by distance using DISTANCE functions. The function has two variants based on either polar coordinates on the sphere or planar coordinates.

Use DISTANCE_GEO for coordinates on the sphere:

DISTANCE_GEO(lat1, long1, lat2, long2)

Assuming long, lat pairs are given in degrees, this calculates distance on earth in meters.

Use DISTANCE_2D for numeric coordinates on the plain:

DISTANCE_2D(x1, y1, x2, y2)

Complete example:

SELECT * FROM places
ORDER BY DISTANCE_GEO(-0.12, 51.5, lat, long);