Clusterpoint NoSQL Database Server: Simplify database design, management and search!Download FREE Software: TEST-DRIVE scalable NoSQL DBMS server software with fast full text search ranking for relevance, clustering in cloud computing architecture, database replication into multiple copiesResell softwareCommercially supported full text seach database software nosql scalable data store platform with enterprise search

Ad Hoc Search Simplicity

The most common format of Clusterpoint search query has a simple ad hoc search centric syntax.  In fact, we decided that the best search practice how people are searching for information on the Internet would also be the best way how to search their databases.  Today ad hoc search is commonly accepted de-facto standard in the best web applications pioneered for use in mass audience by Internet search services. 

We have added query extensions to support structural search using XML's own data structure elements, making this simplified query mechanism as flexible as XML itself, and at the same time nearly as powerful and rich for XML-structured ad hoc search to what SQL database technology provides with its 'select' clause.  In fact, you can combine both ad hoc Internet-style queries with ad hoc XML-structured search queries in Clusterpoint database architecture.  Normally people who know their XML database structure, finds this ad hoc search feature very useful.

Simplicity and efficiency of database search in Clusterpoint database is illustrated here by some sample query formats, for an example, querying a hypothetical 'CVs' database.  All the following are valid queries in Clusterpoint Server database architecture: from simple keywords (Internet-style ad hoc search) to structural search in XML, to narrow down results:


This sample code illustrates only the most trivial use cases of Clusterpoint DBMS query syntax to show you the idea how database search is simplified from the end-user and also from the software application developer point of view.

More complex Clusterpoint queries can use Xpath notation to address specific XML fields and different levels of XML nesting.  

For developers using Clusterpoint API, the search queries can be expressed with more advanced conditional logic, combined with multi-level Boolean expressions, different filtering options etc.  There are more than 160 developer's options available in the Clusterpoint API.  Most options are covering search and data retrieval.  This approach is providing versatile set of tools for our developers to program typical data retrieval tasks for many databases.  

By designing, storing and managing XML-only databases, developers do not need to learn SQL or XQuery search syntax complexity, can build their application code much faster and provide to their users fast search experience, that most users are accustomed to expect on the Internet.  Everyone is using Internet search services today, and having grown-up using instantly responsive search from the global Internet services, it is only natural to ask for the same  search easiness and relevancy accessing corporate databases.  People also prefer to stay within a single search framework, instead of learning custom search methods for each particular database application.  Many legacy database systems still have a lot of search related complexities and limitations, that users find frustrating and difficult to remember. Most of those problems rise from the technical limitations on indexes (used for fast search in any database) and arcane query syntax which is often required even from web users, when using pre-programmed forms for search in legacy databases.

Clusterpoint search is designed for Internet era, enabling to put fast ad hoc search functionality into a center-piece of any corporate database application.  Clusterpoint DBMS platform delivers both the necessary search performance and search relevance to match user expectations for search in corporate databases.  Please also see Advantagesback

Integrated Fast Enterprise Search

One of the biggest problem of traditional legacy databases is that Internet-style full text ad hoc search (for example, by any known context somewhere in a database) is rather cumbersome, requires specialized SQL programming, and significantly slows down performance when number of stored database objects starts to exceed few millions.  Therefore application software developers restrict ad hoc search functionality in large databases, especially for multi-user web applications, to prevent overloading of production SQL database server systems with resource-consuming ad hoc search transactions.  In legacy databases ad hoc free content search is largely incompatible with relational indexing system.  In essence any SQL database server software code, unable to find or build suitable pre-programmed indexes, "scans' through all database records one-by-one and selects matches by comparing all their text content with user provided ad hoc search terms.  Uncontrolled run-away of a free-format ad hoc search can often yield unacceptable delays to other applications.  It also overloads disk storage systems in SQL world: often the complete database content needs to be delivered for ad hoc search query processing in legacy databases.

Most crucially, even if hardware performance and capacity is sufficient to perform database ad hoc search, user generated free-format ad hoc search can very often produce massive results sets, matching search query, in particular, if there are millions of objects in the database.  For example, for a London-based news agency having a database with 10-million plus news articles, if some customers would search for 'London' as their ad hoc query, maybe half of all documents will match the request.  Providing that usage of full text ad hoc search functionality is not restricted in application software, the end-user will be overwhelmed with search results.  There easily can be situations where millions of articles containing search term hit will be found.

To address those generic ad hoc search problems, when result sets can be comparable to the total number of objects into a database, Clusterpoint Server database management platform was designed with an unique competitive advantage - customizable Information Ranking mechanism for the best search relevancy.   It enables to store and manage all of your data in any custom XML format, effectively search it Internet-style, even with simple search keywords, and still get the most relevant data upfront at ultra-fast sub-second speed - independently of the total database size.   Indexing technology we apply does  not overload database servers, and with Clusterpoint Information Ranking you can always organize your database index in such a way as it serves both for fast and for relevant ad hoc search.

Fast and always relevant rich enterprise search in any databases, applying your own customized Information Ranking rules to your database search index, is one of the greatest and the most powerful features of Clusterpoint database platform and its indexing model.  There are many enterprise search options supported by Clusterpoint DBMS, that can benefit from ranked search, please see the table:

Clusterpoint Enterprise Search Options
Generic and built-in rich enterprise search functionality coupled with our unique Information Ranking mechanism is probably the most important competitive advantage of Clusterpoint Server database platform.  In most other solutions you have to resort for a specific integrated enterprise search software assistance to achieve higher relevance of search results.  It ads complexity, which everyone wants to avoid in IT systems today.  In particular, legacy enterprise search software is struggling with database clustering, often requiring extra development efforts to partition logic and algorithms at application software level, again adding complexity to your application software and database systems, making them even more difficult to maintain.  We deliver a clean, straightforward platform solution - Clusterpoint Server XML database software, which has all the functionality built-in.  It does not require to specifically handle the indexing and querying methods in application software to achieve sub-second search for your databases: Clusterpoint Server  automatically indexes full content of every new or updated XML document by default and fast full text search in the entire content of your database is available out-of-the-box.   This default indexing behavior you can tune and customize through configuration files applying your own indexing rules, information ranking and preferences per each particular Clusterpoint storage (a named database of all your XML documents). 

Along this simplicity of automatic entire database content indexing functionality, we provide flexibility and openness of Clusterpoint Index customization for the best relevancy at search using your own information ranking algorithm, designed along your own business rules, using your own corporate known-how and possibly your own trade secrets.  It makes your database search not only fast, but also very simple and productive for your customers. 

Please read more about Clusterpoint Indexing and Information Rankingback

Ultra-fast Search Performance 

By changing Information Ranking rules, which you design based on your own information ranking requirements and want to apply to your XML database, with Clusterpoint Server you can start ordering or grouping your search result sets in your database as your business needs require, adjusting Clusterpoint database index organization model to your particular application.

In result you will quickly achieve fast and relevant search algorithms in your databases, which would outperform relational database systems at search speed by a factor of 10 to 1000. 

Clusterpoint Fast Search Response Time Subsecond vs SQL
It will also be your own application needs driven information ranking model, delivering the best user experience when searching databases with easy-to-use Internet-style ad hoc queries.

Even for small databases, which require complex information grouping and ordering rules, Clusterpoint Server database search mechanism can yield significant competitive advantages for product quality and performance in  interactive web applications.  Majority of todays web applications for database search results grouping and ordering still use SQL functionality, applied through 'group by', 'order by' and 'joint' clauses in SQL SELECT statements.  The performance problems with this approach starts to show up if the database size is too big, or the number of those data grouping and joining options must be combined.  For example, it is common to use multiple 'joins' and different ordering and grouping rules, to make the final search result set relevant from an end-user points of view for many web applications.  SQL technology dramatically slows down, when you start combining joint data sets from two or three similar size databases (tables), and that very often results into 10 or more seconds application waiting time, even for relatively small databases (hundreds of thousands of records per table).  It is too long for web application model: people wait on the Internet for a response times on average up to few seconds, then 'vote by their legs', going to another more 'responsive' service provider. Relational database system has 30+ years old architecture and it was never designed for efficient processing of this type of transactions, characteristic to web IT environment.  In legacy databases search results ordering and grouping is always performed repetitively on-the-fly, per each search request, using large index files and interrelated tables.  If you want to add also full-text search software to that complex data sorting, performance can suffer even more dramatically.  Therefore even relatively small databases which require complex data ordering and grouping rules for web application, start to slow down unacceptably when search is performed using traditional methods of relational queries along traditional methods of enterprise search software tools.

Clusterpoint Server enables to solve this performance problem in different way - by defining your own custom Information Ranking rules, which are algorithms that Clusterpoint Server software uses to apply for search results sorting and grouping, and creating and maintaining custom pre-sorted Clusterpoint Index for your XML database.  No matter how complex is your own custom-relevancy driven results ordering and grouping rule set for your database, Clusterpoint Index and Clusterpoint ad hoc search facility will always provide sub-second search response times: even with complex data ordering and grouping often requiring tens of 'group by', 'order by' and 'join' clauses per query in legacy SQL world.

We have customers who did achieve 1000x fold performance increase by replacing an existing integrated SQL and enterprise search software framework with a single Clusterpoint Server platform functionality, for the same queries and the same search result sets requiring fairly complex business logic of database search results sorting.  Response times in their web applications did drop from high 5-, 10-seconds to 0.005 seconds for those customers.  Please see also our web site sections Customers and Performanceback

Sub-second String Pattern Search

Clusterpoint DBMS has another advantage, which can be useful for many applications in language sciences, computer networking, anti-virus detection, bio informatics and life-sciences etc.

Clusterpoint Fast Pattern Search Database Software
Clusterpoint Server indexing system was engineered, using a graph database concepts, providing storage and access to all unique elements of the database content in the computer RAM.   We simply call it "atomic" index, where any elements can be instantly located.

As the result our database software platform can perform fast data search using pattern matching by lookup in our “atomic” index.  It does not need to scan through disk storage to find out database matches.  In this way Clusterpoint DBMS can radically speed up many string or textual data processing applications, where it is important to find out patterns of texts, for example, by specific letters in a string in a specific position, or all matches by a string wildcard templates, or alternative "spelling" for patterns, enabling search for substitutions and variations of letters in string-type data.

Unlike relational systems, which performs those types of searches really slow and mostly by scanning through the database content, Clusterpoint DBMS delivers sub-second response times.  In addition an indication how many string matching hits were found is always returned, or if the number of hits found in the database is above certain configurable threshold, then calculating assumed expected number of hits in all result set, indicating to end users that their search queries need to be refined. 

Provided this generic string pattern search by lookup into RAM-based graph for occurrences, developers can use another Clusterpoint API feature set to improve the functionality described above with structured search.  There is free combination options to perform string pattern search with a structured search in the Clusterpoint API, using Boolean AND, OR NOT logic and multi-level parenthesis (.. (..() ..)..) , that enables to match not only strings, but also specific database meta data or taxonomy, and in this way build really complex and flexible by logic queries, providing multiple features to narrow down results, or expand them as necessary for a particular application.  

Please remember that Clusterpoint DBMS can handle de-normalized data as well as encoded data, there is no performance loss, be it cryptic codes as XML field values or their human-readable text equivalents.  So our customers can start building truly exciting and fast applications, providing cross-boundary search across unstructured, structured and semi-structured data in their interactive web applications.

Clusterpoint DBMS when used as the core database platform, delivers speed and functionality needed for extremely fast and responsive web applications handling big data bases, especially benefiting people who are working with this type of queries daily like in scientific and research, biomedical and pharma industries.

For example you can quickly find FASTA database all human elements by simple ad hoc search like in a sample query:

[ human ADQLTEEQIAEFKE* ]

resulting in a found (example)

>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*

Please read more about Clusterpoint indexing structure and functionality in section Technology / Indexingback

Linear Search Scalability in Clusters 

Clusterpoint technology was ground-up designed for building and operating scalable databases of massive size.  Our design concept has solved slow search performance problem in a distributed database architecture, essentially making it independent of the total data volume.


The Clusterpoint Server acts as a distributed database (or a grid database) software utilizing total power of all available processors, RAM and disk systems. 

The Clusterpoint software distributes computing tasks across a cluster of networked servers, providing instant and relevant data retrieval from very large databases.

Our customers can start addressing database management problem of their ever growing electronic information volumes, incrementally adding more database servers to run all parts of a distributed cluster database at the same performance level as for a single database: with sub-second response times guaranteed for your search queries. 

Please note that nearly linear search scalability in massively clustered database configuration is achieved through our unique full content indexing system which is based on Information Ranking, a method how to create and maintain scalable index that is ranked for output relevancy with customer own data ranking rules.  

This ranking is crucial for creating highly pre-sorted disk index for Clusterpoint database, that can work 1000s of times faster that any relational indexes at relevant search, with complex and non-trivial business logic for exact results grouping, ordering and positioning, even if database contains billions of data objects and is distributed among large number of cluster servers.  Resulting database search functionality delivers nearly instant, sub-second search experience for our customers, independently how large database and on how many cluster nodes in total it is being operated.

As a result, customers can scale massive databases in a cost-efficient way and achieve radically better information search times in a cluster database.  Key performance indicators:

  • Outperform RDBMS in ad hoc queries (Internet-style full text queries), typically from 10 to 1000 times, when used in the Web application model;
  • Query response times around 0.2 seconds (average query time with disk access) across a multi-server cluster database;
  • Linear scalability addresses future data growth and performance needs cost-effectively: just add extra cluster nodes to accommodate growing database volumes or performance needs;
  • Solve information overload problem for large databases with millions and billions of data objects, by ranking output for most relevant results first (e.g., for most important customer requests in CRM, for most relevant ads in Internet advertising, for most relevant publications in archive of scientific articles etc.).

back

Database Replication in Many Copies 

Clusterpoint provides also another dimension of its database software scalability at search and data storage/retrieval in general.  You can set up and automatically run multiple parallel copies of the same database in a cluster.    

Database replication (database mirroring) into multiple active copies provides both safety and workload scalability with respect of number of users.  You would have always multiple synchronized copies of identical database running on different hardware nodes.  And you will be able to implement  workload sharing on your application servers if required by simply querying multiple copies of the same database, based on your own application logic and preferences, effectively distributing in a cluster of servers overall workload from all users.

scale search transactions through database mirroring
With Clusterpoint you can similarly establish and operate multiple copies of clustered (striped) database, working as a distributed data source for your application services, for example, by putting identical configuration clusters in different geographical locations or data centers, and setting up with Clusterpoint software "remote mirrors" of entire cluster database.

Please note that from application point of view all database synchronization and replication of updates among the cluster databases are done transparently by Clusterpoint Server software.  I will still be a single logical database from a software developer point of view, no need to change application software: not even a single line of it.

This level of automation for underlying our clustering software enables to grow your database related computing capacity in your data center or across all locations of your multiple data centers transparently, without changes in application software framework.

Please also note, that it is also not required to design and maintain at application software level key sharding logic for replication and clustering, or depend on some behind-the-scene replication results.  Unlike many key-value stores and even most of document stores, when choosing Clusterpoint database software infrastructure as your data storage, you will always know where and in how many copies your data is stored.   Clusterpoint provides a database platform where you can precisely control where your information is located, how many copies of your database are running in parallel, and would allow to increase or decrease capacity in controlled way. 

With Clusterpoint DBMS capacity management within a cluster can be done by your data center operator or DBA, without resorting to application developers to reprogram clustering logic, change key sharding principles or implement any other modifications in your application software logic.   Just add / remove / join / cluster nodes and organize mirroring as your business needs are.

Emergency data management also becomes simple - you just need to follow "health" of your cluster/mirror system, to see in timely manner if new hardware is to be added to expand clustering capacity, or detect hardware failures.

Malfunctioning cluster hardware could be quickly taken off line and replaced with a new spare hardware (hardware is expendable today), and Clusterpoint software will automatically synchronize all pending updates for that temporarily missing cluster node.

You can read more about full database automatic mirroring into multiple copies and other database clustering options in section Technology / Clusteringback

Cluster-wide Search Options

For really massive size databases (such as Internet indexes, social network data, life-sciences data etc.), using any standard traffic routing, web server or application server load sharing  tools, you can create a pool of Clusterpoint database clusters for your application, servicing identical clustered database copy in a particular data center, with all servers and data centers together servicing simultaneously very large number of users.  Please see a sample setup in the picture below.

Mirroring of cluster database in multiple copies
In some cases automatic database partitioning performed by Clusterpoint Server by default may be unnecessary, for example, if customers would like to store all data grouped together by geography, group of people, or any other higher level criteria, and keep together distributed across servers by their own business logic.  

Clusterpoint API provides a mechanism to address particular cluster nodes also with a full traditional application programmed database partitioning logic.  Customers can address Clusterpoint API commands only to specific parts of cluster, up to the very basic database storage that serves as a part of the whole database on a particular hardware node in the cluster.   All database update, modification and search commands are supporting this feature.  It enables to build highly customized database storages containing application logic defined subparts of the databases, located on particularly chosen by customer cluster nodes.  Then our customers can provide either a single "global" search across all of cluster nodes (default Clusterpoint API behavior) or select only cluster database parts (nodes) where "local" search is necessary and request search only in this part of the database.

In this way end-user search queries can be targeted only towards selected custom cluster database parts, if this is necessary by application logic, for example, querying only the particular geography data, and ignoring all other cluster nodes, containing data from other parts of the world.

This search partitioning versatility in Clusterpoint DBMS, starting from fully automatic database partitioning, and ending with a single cluster node, provides flexibility and free cluster configurability to our customers, depending on their business model, security requirements or performance requirements.

You can read more about Clusterpoint API in section Documentation / Clusterpoint APIback

Search Comparison to SQL / XQuery

Searching Clusterpoint databases is made simple, easy and understandable to most users, customers and even people without special technical knowledge.  You can perform fast, powerful and versatile search into Clusterpoint databases without the complexity of SQL or XQuery syntax, performing both exact matches and good enough-matches.  If we take our example database of CVs (job offers etc.), with any Clusterpoint database you can search as broadly or as narrowly and precisely as your application needs require:

Search without SQL or XQuery complexity
XQuery language, envisioned for querying XML data, is used as an alternative to SQL by some database vendors today. We believe that Xquery still requires almost as steep learning curve and specific training as SQL.  Maybe this is one of the reasons why XQuery language never got large following among developers.  Many SQL vendors have also added native XML extensions to their flag-ship SQL databases.  People probably tend to avoid learning another similar language when SQL extensions could be used at handling XML.  We think both data querying languages are a way to complex to use them efficiently at handling most of the search in data storages in native XML database format.  Xquery also lacks standards for full text search, which is an essential functionality for modern web applications.  In our opinion Xquery can probably better serve handling some use case exceptions than regular database query tasks common to majority of database applications we use today. 

Clusterpoint DBMS uses neither SQL nor XQuery for its database search mechanism, yet still provides the same search power and information pivoting, filtering, grouping and ordering than those languages for precise sorting of database search results.  

Unlike SQL and XQuery based systems this same powerful database search and sorting functionality is achieved in totally different technical way in the Clusterpoint database architecture: through Clusterpoint Index and Information Ranking.  Clusterpoint database also works at the same search speed independently from the total database size in cluster. It executes search queries in a fraction of a second, nearly without performance loss characteristic to SQL and XQuery database indexes, even when database size grows to billions of objects.

When database search performance problem was solved as described above, for simplicity of database use and ease of understanding, we decided to stay within the common and simple XML concepts - also when querying a database. back