Clusterpoint NoSQL Database Server: Simplify database design, management and search!Download FREE Software: TEST-DRIVE scalable NoSQL DBMS server software with fast full text search ranking for relevance, clustering in cloud computing architecture, database replication into multiple copiesResell softwareCommercially supported full text seach database software nosql scalable data store platform with enterprise search

Automatic Full Content Indexing

Clusterpoint Server automatically creates and maintains complete database index: Clusterpoint Index, which is used for high performance data access and search in all of your stored documents.
 
This our indexing model radically differs from relational SQL world.  In relational database systems indexes have to be specified for fast search, with primary and secondary keys, applied in application specific way across certain normalized multi-table database structure.  It requires a lot of efforts and often painstaking attention to technical details to get the indexing system right from database architects developing SQL systems.  As rather complex indexing system, it is prone to many database administration and software development errors.  It is often quite expensive to maintain knowledge and application software versioning over the relational database life-cycle, and one of the key reasons for this cost is underlying complexity of legacy database indexing system.  Relational indexing requires a customized design of all database entities and relationships.  It is very often started as a one-person concepted custom design of the database and its indexing model, quickly gets outdated or sophisticated when database is developed and when it goes into real life production and maintenance.  Most legacy relational databases gradually becomes heavy to use and understand by follow-up application developers.  We believe that this legacy indexing model is aging, based on more than 30-years ago developed SQL database architecture concept.  Today available hardware capacity and processing power also makes this model outdated and obsolete.  It simply is too complex to design, maintain and to use it efficiently, it requires complex software and complex knowledge to manage all that complexity.

Unlike relational databases, Clusterpoint Server indexes entirely everything that goes into our XML database: all data items, texts, dates, numbers, strings, relations etc.  It also uses very special type of index data storage which we have developed to implement generic and absolutely scalable indexing system for a massively distributed database architecture, potentially running across large number of hardware servers (in 1000s of servers storing only parts of total database).  Clusterpoint Index is a full database content index, and takes advantage of modern hardware capacity: abundance of disk storage and RAM memory, cheap CPU power and ubiquitous networking.  Hardware and storage cost is nearly nothing compared to the cost of software development, integration and maintenance today. To simplify database management complexity, we simplified database indexing to the ultimately possible: index everything which may be searched for.  Clusterpoint Index is designed for high-speed search in any database structure.  It is a database structure independent index and therefore much more simple to use and manage than a fixed-structure relational SQL index.  It eliminates the need to "know" about the index structure in application software and therefore cuts of lot of index-specific application programming efforts from database software developers agenda.  Also DBAs have much less problems to handle with a single uniform database index, compared to often bizarre indexing systems in SQL world they need to support for each particular database.

Clusterpoint Index is being updated in real-time, when new documents are being stored into Clusterpoint storage or existing ones modified or deleted. back


Graph-driven 'Atomic' Data Indexing

On database system architecture level Clusterpoint Index was designed by "marrying" mathematical concepts of a graph database indexing principles, with our custom inverted index used for full-text search, and with traditional B-trieve type of indexes, used for structural index elements requiring traditional sorting methods. 

 Clusterpoint graph database index ranked for relevance
The picture above illustrates general concept how Clusterpoint database indexing system works.  Your XML documents are stored into Document repository in their original XML format.  We do not change them.  The content of this your XML database is being used to create and maintain a RAM based fast-access Vocabulary and disk-based Clusterpoint Index by Clusterpoint Server.  For each database storage on a particular cluster node the resulting index is completely autonomous and only serves for ultra-fast access for documents stored on that particular cluster node.

 This database storage and indexing architecture provides high resilience against the service unavailability in massively clustered environments.  Even if some hardware equipment fails, it would affect only a small part of the total database, which may be temporarily unavailable, but will guarantee that your database services will never be lost completely.  This architecture also allows to mirror database storages on each cluster node into as many identical parts as necessary within a cluster, guaranteeing 100% availability of database services through additional mirroring of cluster storages.  back

Organized By Information Ranking

Clusterpoint Index internal structure is engineered using a set of methods and algorithms that produce a unique 'atomic' index from any supplied database content stored on each cluster node.  Clusterpoint Server software indexes and stores every smallest item of information present in all of your documents in the XML database storage managed by that particular cluster node: words, values, emails, strings, numbers, dates, relations, xml tags etc., along the required relationships and Information Ranking attributes.   That is why we call this 'atomic' indexing model.

This 'atomicity' of Clusterpoint database index together with Information Ranking effectively allows to partition database of any size into as many parts as necessary in distributed hardware cluster, and still guarantee ultra-fast and relevant search, using just the very basic elements of our 'atomic' index as search terms. This 'atomic' index forms the foundation of Clusterpoint's database platform high performance indexing and search mechanism.

We have also developed lightning fast database querying algorithms that can work with that 'atomic' index combined with your XML documents in totally different way, that is orders of magnitude more efficient at relevant information retrieval and search than any relational software is capable to do.  

Efficiency is achieved through Information Ranking mechanism enabling to assign relevancy and rank your information in your total database content, customizing your database search algorithms for exact data grouping and ordering, that your users consider "the most relevant".  This customization is crucial for a good user search experience, in particular, in large databases where any search query can produce overwhelming results sets and frustrated multi-page browsing by users, often producing thousands and even millions of results matching the user's search query.

Overall organization for Clusterpoint 'atomic' index elements illustratively can be described as a huge graph of interconnected and ranked for relevance "atoms" (all database index elements):

Model of Clusterpoint database index ranking for relevance
In this model (please forgive us, if you did not find physics interesting) our database index organization could be explained with a model of grouping and ordering chemical elements in The Periodic Table of Elements, where each atom have a mass and chemical energy.  Imagine Clusterpoint DBMS being like a chemical laboratory, that takes matter (all database content), splits it into basic chemical elements, and organizes and groups it for fast access and search into Clusterpoint Index similarly like chemical elements are grouped in The Periodic Table of Elements. Clusterpoint DBMS also provides you with a mechanism to assign your own customized atomic mass (XML data structure relevancy weight) and chemical energy (document rate) as ranking attributes to all of your index 'atoms'.  And we provide the engine that uses that mechanism to instantly retrieve the most massive and the most energetic 'atoms' and all related things that are made of them (your XML documents), sorted and grouped for meaningful search. 

Technical details how to apply Clusterpoint information ranking can be read in the section Information Ranking.

As a result Clusterpoint Server based database system can instantly find any content in any custom database by simple and user-friendly query mechanism, with Internet-style ad hoc query terms, and return results sorted by your own customizable relevance.   With Clusterpoint DBMS you can design your own unique ranking system for your valuable business databases.  Controlling your databases through your own custom ranked search you can provide high level of your customer satisfaction and great search experience that others would struggle to match.  A custom defined Information Ranking rules for a Clusterpoint database you can even protect with patents and commercial secret.  Our Document Policy file describing your ranking rules is a structured rule set which is uniquely designed for your particular XML database storage and therefore can be protected by trade secret.

Our customizable database index ranking system simplifies database search and makes it extremely powerful and fast in any database.  It is capable to bring the most relevant database search results always on the first web page, sorting and grouping your database information according to your own business needs.  This is in stark contrast with some closed and proprietary information ranking systems available on the Internet, where you depend on someone else to organize your information.  

Using analogy again, the Clusterpoint indexing technology enables to instantly "find any needle in a very large hay stack" where "a hay stack" is the complete content of your database.  Actually it can also find "a needle in thousands of hay stacks", executing sub-second database search query in a large cluster of servers without performance penalties, characteristic to legacy database architectures.  Even more search power - it can find not only "a single needle", but, if we continue our "hay stack" analogy, the Clusterpoint database technology can instantly find "all needles from all hay stacks (cluster nodes) and deliver them sorted according to their weight and length, more weighty and lengthy needles first".  And we also supply you with a ranking mechanism to custom assign weight and length to each particular needle relatively to  other needles.

We hope that those two analogies above illustrates how Clusterpoint Index and related Information Ranking works.  For technical details about database ranking and index customization please look into the section Information Rankingback

Energy Efficient Indexing Model

Technically Clusterpoint database technology indexing mechanism creates a fast and pre-sorted disk index for your XML database, which improves overall efficiency of your IT system by requiring less CPU and less disk access across all of your IT infrastructure.  Clusterpoint indexing model is very efficient in write-seldom, read many times database usage models dominating web-driven IT industry today.

It eliminates repetitive sorting and grouping of data per each query as in SQL server, a legacy method requiring to load large size index files or substantial parts of those index files into memory for efficient data sorting and processing. Clusterpoint Index does not require constant swapping of large size index files from disk to memory for achieving fast search queries.  Eliminating the need to read from disks tens and hundreds of megabytes of index data per query as in SQL databases, Clusterpoint Server is radically reducing workload on database server disk subsystems: by several orders of magnitude.  It significantly contributes to energy footprint, requiring less powerful servers to manage large databases.  Installed server capacity can be re-used for other purposes and can be switched off.

Clusterpoint software green technology 
In Clusterpoint database 'atomic' index architecture most index data retrieval is sequential and packetized in small 5K-10K data transfer transactions between cluster nodes.  In essence, Clusterpoint Server for each search term perform direct disk access to find respective 'atom' with a tree of ranked attribute "leaves" pointing to matching documents.  Probably few sectors of data is read from the disk per each search term per cluster node.  As all "leaves" are already sorted according to the relevance defined by Information Ranking rules, there is only minimum disk operations required per server to return the most relevant data to the cluster node, initiating the query.  Most of the time taken by the Clusterpoint Server system to answer a particular query is to wait on network data transfers, while all 5K-10K packets are received and merged, and final result set is delivered to the requesting web application. Typically it takes around 0.2 seconds to respond to the search query in Clusterpoint architecture, even with disk access performed on multiple networked cluster nodes.

Those savings for heavily used databases translate also into substantially fewer disk input/output operations, less and smaller disk data buffering and caching needs, and big savings in processing power requirements.  Add to those savings also fewer database search transactions from unproductive multi-page browsing.  With Clusterpoint database software the most relevant data is almost always available on the first web page and multi-page results browsing is not necessary.  Less browsing activity also reduces your corporate web server, application server and network traffic volumes (commonly encrypted by SSL, taking an extra toll on CPUs), and further reduces resource consumption within all of your core IT systems.  Finally, instantly responsive corporate databases cut unnecessary waiting time of your employees for search results and requires less idle time for computing resources at your employee workplaces. Cutting transaction volume and making any corporate database search experience fast and relevant with Clusterpoint software will contribute to overall business productivity and work efficiency.

Taking into account all of the above about energy efficiency, we have firm grounds to believe that Clusterpoint database server software platform delivers much more "green" power-saving technology in data management than any SQL technology could ever do. back

Linear Ability to Scale Out Indexing

Clusterpoint database indexing model is independent from a cluster network configuration and independent from total number of cluster nodes.  It enables our customers incrementally to add new servers to the cluster when their database size or usage grows.  Customers can flexibly increase their database storage capacity scaling out cluster with new servers or flexibly reconfigure cluster database operational capacity by swapping parts of database on different servers, without negatively affecting performance of their database operations.  This flexibility of our software scalability is ideal fit for modern cloud environments: all clustering setup and operations for a database can be performed by system administrators, without involvement of application and database software programmers.  Actually there is no need to change application software at all, compared to many cluster systems requiring partitioning of database clustering logic also in application software.

Clusterpoint indexing linear cluster advantage
In a massively clustered environment many of your time consuming database tasks such as indexing or reindexing could be easily split among large number of servers available in Clusterpoint distributed database architecture, yielding productivity gains proportionally to the number of servers in a cluster.

Clusterpoint database architecture is designed to efficiently scale to run on hundreds and even thousands of computers in a single cluster.  With hundreds of underlying servers there may be needed some database software adjustments for matching network configuration, to avoid network related bottlenecks, yet our database software platform was designed on system architecture level with this generic capacity to scale out linearly.  We are welcoming our partners to suggest projects where we can test-drive the software on massive data sets requiring such scalability, and who can provide hardware resources.  Please see also Partnerships.  

Below are our sample scalable database application project, where we can also provide the prototype application software to our partner. back

Use Cases of Scalable Applications

Linear scale out ability of Clusterpoint DBMS starts to be affected with large number of extra servers, if they all are using the same network switching infrastructure in linearly connected way.  Network transaction times among cluster nodes, albeit individually very short, with large number of linearly connected cluster nodes start affecting overall system performance of a distributed database performance.

To achieve much greater scope of scalable IT database infrastructure capacity and still maintain the same ultra-fast performance, you may need to set up a custom networking infrastructure among hundreds of hardware nodes so, that there is minimum network traffic switching over the same hardware links. 

For example, you can set up a hierarchy (connected in a tree-topology) server farm, minimizing the number of traffic hops from the top level network cluster nodes to the bottom level network cluster nodes.  Clusterpoint Server software is architected and engineered in such a way, that is is possible to easily customize it to your specific hierarchal network switching infrastructure, building and efficiently operating really massive instantly searchable databases such as Internet indexes, huge library and document archives, billions of tweets etc.

Otherwise, without taking into account the factor of network switching "fabric" effect , it would not be possible to guarantee sub second search times for terabyte- and petabyte-size databases.

We have built on top of our database platform several scalable demonstration applications,  that proves this scalability and performance of Clusterpoint DBMS in demanding computing environments.

Sample Application No 1: Global Internet Search Platform

For example, Clusterpoint DBMS software together with our prototype Global Internet Search Platform application can be used as a cohesive and inter operable, fully scalable database and application software solution for ambitious Web search infrastructure projects, which may require massively scalable and entirely searchable Internet search index capacity.  Clusterpoint technology can deliver the necessary scalability and performance.  We would be glad to provide to interested our customers a streamlined and robust all-included Internet search solution that scales: Global Internet Search Platform application, based on scalable Clusterpoint DBMS data storage.

Global Internet Search Platform's software key advantage is simplicity: no integration of architecturally and conceptually different systems are required.  Most often integration costs for putting together disparate systems are too high, their management requires a lot of attention and efforts, that very often and famously results into spectacular failures.  

We invite interested parties to try Clusterpoint database technology for their Internet search projects to see the difference.

Clusterpoint Internet search engine platform software
Here is how our technology solution is designed to work for a national or a global Internet search project illustrated above:

Step 1:
Internet Crawler (a key part of our Global Internet Search Platform) application is launched from all cluster nodes for automatic link spidering downloading Internet information into a distributed Clusterpoint DBMS database

Step 2:
All downloaded data is stored into the Clusterpoint cluster storage (database) spanning multiple hard-wired servers, that are interconnected between themselves into a tree-like network topology, designed with maximum 4- or 5- levels of networking 'hops' between servers on any two different levels of the cluster hierarchy;

Step 3:
Step 1 is repeated, re-crawling Internet again, this time and all next re-crawling times applying customer defined Information Ranking rules to all database objects simply by rewriting them; initially two full crawls of Internet are necessary; the customer defined Information Ranking algorithm depends on previously collected full database statistics; the database collected during Step 1 is at first being built from zero and statistically meaningful and correct data is not available until the full data set will be crawled and collected at least once; all consecutive crawls and all incremental index updates will further improve Information Ranking, if it will be based on some recurring statistical re-calculation algorithm, for example, taking into account aggregated totals form previously calculated statistics per each database object of interest.  After minimum of two full Internet crawls the resulting dataset becomes relevantly indexed from the search application user point of view; it is then possible to apply even better fine-tuned algorithm where index quality improves with each next crawling round.

Step 4:  The resulting Internet search index database (with all original content downloaded, cached and stored in Clusterpoint database) is used to operate a large scale Internet search service.  With appropriate hardware and networking infrastructure, the system potentially can store billions of database objects and make them searchable with simple ad hoc queries from end-users with query response times in low fraction of a second.  With Clusterpoint DBMS our customers can provide search results based oncustomer own defined relevancy algorithms providing whatever ranking formulas they consider competitive.

You can read more about Global Internet Search Application in our Web site Products / Applications section.

Sample Application No 2: Clusterpoint Network Traffic Surveillance System Application

Another our sample application to illustrate our database technology is Clusterpoint  Network Traffic Surveillance System (NTSS) - a scalable network traffic capturing, storage and search database application, which runs on top of Clusterpoint DBMS.  

For NTSS we have set up the dedicated product Web site, following our customer feedback, as it quickly turned from a prototype scalable database application into a full-fledged commercial software product, solving customer problems in IT and business security and compliance area.  In fact, NTSS application allows to create and maintain fully and instantly searchable corporate network traffic database.

Please visit Clusterpoint NTSS product Web site to learn more.

back