Main Concepts

Clusterpoint is a next generation data management and computing infrastructure designed for era of cloud computing when physical platform is a massive cluster of commodity hardware.

Clusterpoint facilitates high parallelism of computing and distribution of data. This section will give a high level overview of how Clusterpoint deals with data and enables highly parallel computing with JS/SQL.

In Clusterpoint we use following data storage hierarchy:    Document > Collection > Database

Document is the basic unit of data in Clusterpoint database. It represents a self-contained unit of data, comprised of hierarchically organized fields that are stored as a together and often queried together. You can think of a Document as object or instance of a class you operate by your application or service. Documents can be stored in JSON or XML formats. Documents are stored in Collections.

Documents are organized and stored in Collections, where each Collection stores objects of the same type or structure. We recommend to combine related Collections under one DatabaseConcept of the Collection is similar to a concept of table in relational database, but unlike SQL database Clusterpoint does not enforce a strict data schema.

Database consists of Collections and visually represents related collections.

Query language: JS/SQL

Documents are accessed using JS/SQL (JavaScript/SQL) statements executed against Clusterpoint. JS/SQL is based on the idea that instead of defining a proprietary query language we access objects using just the JavaScript, which is organized in a SQL-resembling structure. From the opposite angle it looks like SQL statements with arbitrary JavaScript embedded into them. This allows powerful, extensible transformations done on the data right in the database.

Data Model

At data insertion time Clusterpoint discovers data types and creates indices that make access to data efficient. At query time JavaScript code accesses documents as if they were objects in the JavaScript memory, however Clusterpoint makes sure that accessing specific fields works with indices. Thus Clusterpoint offers unlimited flexibility, but minimizes IO requirements and computing resources necessary to fulfill the requests.