Clusterpoint NoSQL Database Server: Simplify database design, management and search!Download FREE Software: TEST-DRIVE scalable NoSQL DBMS server software with fast full text search ranking for relevance, clustering in cloud computing architecture, database replication into multiple copiesResell softwareCommercially supported full text seach database software nosql scalable data store platform with enterprise search

Benchmarking NoSQL Databases

Performance benchmarking by vendors seldom is regarded as objective and truth-full criteria to decide if the product suits customer needs.  Clusterpoint Server software performance, as all database server software, is Dependant on multitude of factors affecting performance, among key factors being:

  • CPU hardware (processor speed, number of processors, number of CPU cores)
  • hardware architecture (64-bit or 32-bit)
  • free RAM
  • speed of disk storage subsystem
  • networking speed among cluster nodes in a distributed cluster database
  • organization of a particular cluster configuration
  • type of database objects to be performance-tested
  • size of database objects
  • number of database objects
  • database software configuration and tuning parameters (e.g., size of pre-read buffers, types of indexing rules etc.)

There are many other factors affecting overall benchmarking, depending on application specific features: do you want batch upload during indexing, advanced stemming queries or facets at search etc.  There is no possible to find a single 'magic bullet' for all possible customer use cases.  Each use case may be completely different and highly depending from the database content and functionality that customer needs.  Especially it is true for a new and emerging NoSQL and document-oriented database platforms which we believe are destined to dominate IT industry database market trends in future.  There are few standard industry test benchmarks in this new NoSQL market anyway.

Therefore we decided that the best method how to proof our database platform technology viability for your needs is to offer absolutely free downloadable software for test driving and evaluation.  Please see Downloadback

Sample Reference Test Database

We recommend to all our customers to download the Clusterpoint Software software, and supply some big sample data set of typical customer database.  The more data is supplied for a Clusterpoint Server to handle, the best objective performance results could be achieved.  Preferably a good sample data set is at least few millions of XML documents, more is better.  Our customers are encouraged to build their own prototypes to test Clusterpoint Software, and submit to us their results, which we can publish if we see that those results can be relevant to other customers too.

If you are interested to performance test Clusterpoint software on your hardware, we can also offer you assistance to do a quick prototyping of an application functionality you need using Clusterpoint Server database platform.  Then you can most objectively compare Clusterpoint software benchmarks to your own tests using other software systems.  Test can be then also performed directly on your hardware, in your IT environment, where you can install our prototype application and Clusterpoint software, to be sure that production system will match your performance expectations.  

Normally we can also sign mutual NDA for your peace of mind with respect of your sample database data.

Please contact us with inquiries about prototype testing on Clusterpoint Support Email. back

Content Indexing Speed Benchmarks

Since Clusterpoint is a NoSQL document-oriented database with built-in full text search, we decided that one specific area of our customer interest will be how Clusterpoint database performs with respect of enterprise search tools.   For giving you a ballpark figure about performance speed of Clusterpoint Software, when used in configurations, where it can fully replace existing SQL server technology plus an integrated enterprise search tool, please see below performance tests we did using Clusterpoint and some popular among developers search tools.

For tests we used the following publicly available on the Internet sample database:  the OHSUMED test collection from the Web address: http://trec.nist.gov/data/t9_filtering.html.

The OHSUMED test collection is a set of 348,566 references from MEDLINE, the on-line medical information database, consisting of titles and/or abstracts from 270 medical journals over a five-year period (1987-1991). The available fields are title, abstract, MeSH indexing terms, author, source, and publication type. The National Library of Medicine has agreed to make the MEDLINE references in the test database available for experimentation, restricted to the following conditions (we re-publish them to be sure everyone knows):

  • The data will not be used in any non-experimental clinical, library, or other setting.
  • Any human users of the data will explicitly be told that the data is incomplete and out-of-date.

This database is technically characterized also by:

  • Document count: 196 403
  • Data size (XML): 302 MB

For hardware we used the following our more than 2-years old commodity hardware:

  • CPU: (2 core) : Intel(R) Pentium(R) Dual  CPU  E2180  @ 2.00GHz
  • HDD: SAMSUNG HD161HJ SATA 150GB 7200rpm
  • RAM: 4GB
  • OS: openSUSE 11.3 (x86_64) 

Here are ballpark figures for raw data indexing speed, which we expanded with few extra software-configured clustering configurations tested to show scalability of Clusterpoint database platform.  


Clusterpoint XML database indexing performance scalability

Solid color bars shows bare bone comparison indexing test benchmarks without any special clustering configuration or tuning. We just loaded each software with data from the sample database at raw indexing speed, and measured total time it takes. 

One can see from solid color bars, that in raw indexing tasks Clusterpoint software with its 5MB per seconds XML indexing could outperform or match two of the industry fastest enterprise search systems, used by open source community.  

Please note that our test hardware is pretty basic, and a bit outdated (3-4 years old), and our tests on latest hardware easily could double or triple indexing speed per server for all software platforms.  Please also note, that we compared totally different software categories, our being an XML database server, and dedicated enterprise search (full text index building) tools, for a very specific and narrow functionality: indexing.  Clusterpoint does much more, including Information Ranking setup.

In our tests green-colored tiled bars show ball-park figures for different clustering scalability factors of Clusterpoint software, taking into account how many 'virtual' cluster nodes are set up per server and how many CPU cores are per server. With Clusterpoint Server software it is possible to configure multiple cluster nodes to be run on the same hardware server as separate server software instances running in RAM and servicing their own parts of cluster database storages (they will be running like virtual cluster nodes, in parallel on the same hardware server).  It enables to scale performance further up as each node can use its own CPU core and memory space, just sharing the common disk storage sub-system.

Tiled benchmarks of our performance tests illustrate that overall indexing speed scaling factor is about 1,5 times per core per single server.   It can be affected by shared use for other applications, if run on the same hardware.  Too many cores would also decrease this factor (disk storage will sooner become bottleneck). You can take that multi-core scalability factor into account, when building your hardware setup and calculating how to match hardware configuration to your overall database performance requirements. 

Please note, that we do not test classic multi-server only single-node database per server cluster performance, as it is clear that indexing performance scalability is linear in all such configurations with Clusterpoint (e.g., 2 servers handle indexing at double shorter speed, 3 servers handle indexing at 3 times shorter speed and so on).

With Clusterpoint Server database platform software you can start building your own flexibly scalable private-cloud architecture database storage infrastructure.  It provides for performance and database storage scalability, using configurable software clustering options, through linear increase of number of servers and number of CPUs, and also can scale with number of CPU cores.  Please read also Clusteringback

Full Text Search Speed Benchmarks

Similarly to indexing, search performance must be tested on a customer sample database.  Search is always application specific, and it is probably meaningless to test some abstract search operations in a database, without first setting up all the requirements for a particular test and without providing statistically representative database content. 

General ball park figures for full-text search performance with Clusterpoint can be expressed as response times for ad hoc search queries, typical on the Internet.  When common Internet-style search queries are executed in Clusterpoint database storage, in most cases we can guarantee sub-second response time.  Here are some basic categories of search queries and their response times delivered by Clusterpoint Server software:

  • <1 second: full text search with word stemming, facets, large data sets, with disk access;
  • <0,2 seconds: the most typical Internet-style keyword search (full text search), large data sets, with disk access;
  • <0,05 seconds: ad hoc full text search in memory-cached databases (databases of smaller size, e.g., few hundred thousands of records);
  • <0,005 seconds: multi-page browsing by search with pre-cached disk read data;

Please note that those are only very approximate figures, mostly based on our customer feedback about response times they have achieved with Clusterpoint database software for their particular custom applications.

Please also note, that those search response times are total database size independent, when database is configured to run into cluster configuration.  This is great advantage of Clusterpoint database platform, which uses Clusterpoint Index and Clusterpoint Information Ranking to create a set or pre-sorted indexes, which enable database server software to perform ultra-fast Clusterpoint Search cluster-wide, no matter what is the database total size and no matter in how many parts the total database content is partitioned among multiple hardware servers.  Please see also Clustering.

Please also note, that search query response times are also completely independent from the customer applied Information Ranking rules, which customer may configure to achieve the most relevant search results grouping and ordering.  In many other systems, especially, when SQL and enterprise search systems perform ranked queries, performance often drops when there is multiple ranking rules or complex data grouping and ordering required.  With Clusterpoint database platform our customers can design and implement as many ranking rules as necessary, implementing tens of data relevance grouping and ordering rules per search query, without adversely affecting search performance response time.  Clusterpoint database platform is designed for high speed relevant search, please see also Clusterpoint Technology Overview.

To determine performance benchmarks for your particular application, the best method is to test it on your sample database, please see sub-section: Performance Prototyping on a Customer Sample Databaseback

Linear Multi-User Workload Scaling

With respect of number of simultaneous users performing search queries there is very simple solution in Clusterpoint architecture. Customers can easily set up a cluster configuration to accommodate as many users as necessary, by simple full database replication into multiple copies in a cluster (database mirroring). 

Using any available standard web-server or application server load balancing tools, workload from all users can be equally split by customer application among many fully working production copies of the Clusterpoint database.  It results into maintaining the same level of sub-second search response times, no matter how large number of users must be serviced.

System administrators can configure to run as many mirror copies of the Clusterpoint database as necessary, using Clusterpoint Manager utility web-interface and required number of hardware servers installed with Clusterpoint Server software. There is no need to change a single line in customer application software for capacity increase.  There is no need to migrate database structure or replicate data with supporting software.  Clusterpoint database platform was designed to provide this database mirroring configurability transparently and can be performed only with resources of system administrators or DBAs.  All synchronization of database updates will be performed by Clusterpoint Server software, running on mirrored cluster nodes.  The database will still "appear" as a single logical database for customer applications.  

Reconfiguring cluster database to run on N extra mirror copies will linearly increase capacity of simultaneous number of users serviced by a factor of N.

This is probably one of the key advantages of using Clusterpoint database platform.   You can start planning your database storage infrastructure using fixed costs model for capacity increase, based on number of users.  And you can quickly perform capacity upgrade just by adding extra servers, and creating extra mirror copies of production database at the data center level, without changing any customer application software. 

 Please see Clusteringback