Clusterpoint NoSQL Database Server: Simplify database design, management and search!Download FREE Software: TEST-DRIVE scalable NoSQL DBMS server software with fast full text search ranking for relevance, clustering in cloud computing architecture, database replication into multiple copiesResell softwareCommercially supported full text seach database software nosql scalable data store platform with enterprise search

General Platform Features

  • Customizable index ranking for the best search relevance and linear database scalability: being probably the most disruptive feature of our database software technology, it enables our customers to uniquely rank all content in their databases for custom relevance rules at database configuration level;  this search customization functionality is delivered by Clusterpoint Server for any database out-of-the-box, so that any database search query always returns results precisely positioned, grouped and ordered by those customer rankings, in the most useful and meaningful (relevant) way for end-users; ranked index also guarantees that query response times will be well below low fractions of a second even when querying data from very large distributed (clustered) databases.  This ranked indexing method, once configured at database configuration level by our customer, is then automatically applied to full database content by Clusterpoint Server software and automatically maintained for all database updates in real-time; we call this freely programmable database content ranking mechanism "information ranking" as it makes information (data that is valuable, relevant for users) search simple, fast and natively scalable in cloud IT concept, almost an ideal indexing model for interactive web applications.  Please see Information Ranking;
  • Navtive data store clustering: distributed XML-/JSON- only database architecture with a built-in enterprise search  functionality, including fast full text search; Clusterpoint data storage engine and built-in database search engine both supports generic clustering architecture, providing high speed indexing and search performance for XML / JSON data independently of the total database size.  The system can be gradually scaled out by adding hardware cluster nodes incrementally, without affecting negatively overall database system performance;  Please see Clustering.
  • Energy-efficient NoSQL database: designed for private and public cloud IT architecture, that uses inexpensive commodity networked hardware with modest energy-efficient computing requirements;; 
  • Elasticity: any Clusterpoint database can be effortlessly scaled out from a single server to massively clustered system distributed among hundreds of servers (cluster nodes); with hierarchal hardware/network setup scaling out even to thousands of machines and billions of database objects without losing fast sub-second database ad hoc search performance. Any database storage capacity increase of performance workload split among mirror copies can be performed by operational department, flexibly adding new cluster nodes and scaling out database services incrementally at data center level, without any changes to application software. Unlike many other NOSQL software and cluster file systems with distributed data storage functionality, Clusterpoint DBMS in engineered to provide tightly controlled and planned clustering functionality with precise control of computing and storage location for clustered database parts and database replication copies (mirroring); all clustering and mirroring options are managed and centrally controlled through Clusterpoint Manager, providing level of exact control of database storage location and database capacity planning that are normally required  by responsible enterprise customers.  Please see Management;
  • Cross-platform: store, access and search any custom data objects designed in industry standard schema-less XML data;
  • Key-based for storage, document-oriented for search: undivided XML / JSON data objects stored by unique id key, a machine-readable database model easily understandable also by humans; for data storage and basic retrieval it works as a simple key-value data store, however for search it works as a structured data store enabling to perform combined structured, semi-structured and unstructured search queries that uses customer own data structure to select, narrow down or expand set of data objects retrieved based on customer own criteria (much like any SQL SELECT clause does).effectively manages undivided XML objects, including completely de-normalized data, providing high speed search within entire database or within a specific XML sub-structure, or combination of both;
  • Fully indexed: instantly searchable XML-/ JSON- only database, automatically build full text search index, does not require to integrate your legacy database software with enterprise search tools to achieve the same search speed and functionality;
  • Full-text search:  integrated enterprise search engine is built-in as part of the database server software, having a rich set of search options and web developers friendly API with paging, hit prediction, faceted search, fuzzy search for "Did you mean that ...?" functionality and many other features, please see below subsection Database Search Features and how Clusterpoint search was implemented at system level in section Technology / Search;
  • Open API: based on XML / JSON and HTTP/HTTPS, using the REST design principles; the Clusterpoint API messaging protocol is openly published and inter operable with any other programming environment;
  • Client-server database:  database server architecture fits model of multi-tier servers (web, application and database servers), serving as the back-office data storage and management software platform to store, access and search customer XML data at high speed, in a distributed and massively scalable database architecture; and servicing customer web applications through Clusterpoint API;
  • High performance database: server software is designed and developed in fast C/C++ source code to run natively under most popular hardware servers operating systems;
  • Distributed database: Clusterpoint server software works as transparent cluster software, no master software is necessary, providing no single point of failure availability for cluster database operations; any failed cluster node would not affect other cluster nodes, and depending on clustering configuration, can provide full database redundancy by running multiple production copies of database in parallel on different hardware (database mirroring), or a high availability cluster database system where each of N cluster servers is storing and managing 1/Nth of database total content (database striping);
  • Native XML / JSON data storage:  can store any arbitrary XML- / JSON- formatted data object as a schema-less XML/JSON document into the database (that is why we call any database 'a storage'), retrieve it back, update it or delete, and search and access with Clusterpoint database server various API commands.  XML- and JSON- are an industry standard supported by most software platforms today,  and a very flexible data format, which can accommodate virtually any other data structures: texts, database records, tables, object hierarchies, linked data, even binary data files such as video, audio (through Bin64 encoding of XML tags);
  • Structure agnostic data storage: unlike legacy relational systems with fixed data structure, the Clusterpoint Servers handles all XML documents the same way, irrespective of their internal structure; a feature enables to store totally different structure data objects in the same searchable database, applying just common meta data and unique identification of all documents to manage data storage;  it is also a flexible database model - you can change or expand any data objects as necessary for your applications, just by rewriting your XML documents into the Clusterpoint after structural modifications adding new field, or changing existing ones: no need to migrate complete database due to structural changes as in SQL world;
  • Full database replication support: run a production database in fully synchronized multiple copies in a cluster (we call 'mirror databases'), where synchronization of updates among cluster nodes is taken care by Clusterpoint Server software, no need to program it into the application software;  can be useful for building and running multiple identical active on-line backup databases, that can also be used for search and access load-sharing by different web applications (for example, a web portal with millions of users can effectively split search and data retrieval workload among as many database copies as necessary to handle all users at high performance);
  • Real-time updates: database is updated in real time, just by sending any custom XML / JSON documents over Clusterpoint API protocol; full-text index is automatically updated upon each update transaction, no delays or full-text index inconsistencies which is a problem for many enterprise search systems;
  • Reliability: follows ACID transaction model - all database transactions are "atomic" transactions working with complete and undivided customer XML / JSON documents as basic data objects.  Transactions are either being fully written to the database, and in this way committed automatically, or generates and error.  There is no need to program complex transaction management what multi-table relational systems with relations require in Clusterpoint database architecture;
  • Parallel indexing/search: the core database server software is designed for contemporary hardware architecture using multiple processors and multi-core CPUs; the software is using multi-threading to effectively parallelize search and indexing processes; and is configurable for each database, to match available hardware capacity and customer performance requirements;
  • Virtualization support: the database server software platform was designed to be run in separate different memory address spaces for separate databases, effectively enabling to run multiple Clusterpoint Server instances on the same hardware, each for its own separate database or applications, with its own users and access security; it is an ideal design also for contemporary virtualized environments, where different customers can securely share the same hardware resource using virtual machines;  one can even set of a large database cluster emulation using just virtual machines as cluster nodes on the same hardware, and using Clusterpoint Manager utility to configure and run complete database as a distributed database -  great feature for testing and development;
  • Enterprise Security: User authentication and authorization is based on access groups & roles, using Clusterpoint Manager utility, for each database storage can be assigned its own set of users, with different privileges on database modification and search (up to the particular API level commands); supports enterprise LDAP and ADS if necessary, however, as Clusterpoint Server is normally accessed only by DBAs, application software developers and application software over API, which in most cases already handles end-user logins and authorization in their own customer applications (normally by an application server), the end-user authentication subsystem could be also switched off to increase performance of customer web applications (for example, if Clusterpoint database is used only in the 3rd-tier data storage level, after web-server and application server, having their own access controls and user authentication logic);
  • Centralized management: web-server based user interface utility Clusterpoint Manager is being used for centralized administration and configuration of all Clusterpoint Server installed hardware servers enterprise-wide.  DBAs or system administrators can use any Web browser, to log in and administer remotely the whole Clusterpoint database platform infrastructure within an organization: all storages (databases), all servers (active databases served by Clusterpoint Server instances), and all clusters.  For cluster databases Clusterpoint Manager provides a single-click cluster wide configuration changes;
  • Rich API functionality: there is more than 140 database storage, access and search options available for application developers through our Clusterpoint API.  The key API command set is relatively small and easy to learn.  Yet the most API commands are highly customizable with extra options and attributes, which we constantly add and improve based on our customer feedback.  Using Clusterpoint API our customers are building full-fledged database applications, replacing more complex and time-consuming development for relational database SQL-based systems, with a quick and agile development for web applications using only Clusterpoint XML-only document database storage concept;
  • Combined queries: Clusterpoint Server is focused on database search: you can do full-text, structured, semi-structured, numeric, date, geo-spacial search etc.  You can combined any search query, in particular, full text (ad hoc) and structured search using Clusterpoint API and Information Ranking, to quickly deliver or add to your corporate databases instantly responsive end user-friendly Internet-style search functionality. With simple search keywords as queries and ranked database information you can deliver for the first web page of your database search application always the most relevant database information.  Simplicity, ultra-fast sub-second speed and relevance of database search enables to achieve the best user search experience for your web applications using Clusterpoint XML-only database platform;
  • Capacity: there is no specific limitation for a database size or number of XML / JSON data objects managed on the Clusterpoint database platform.  Clusterpoint database can scale from hundreds of gigabytes on a single server, to many petabytes in a large cluster; Information Ranking is designed to scale over 400 billion data items per database (a data item, for example, can be a Title field for a news article, a person name field in address book, or a customer name in a CRM application), which is how many unique values you can assign to your XML data items, in reality information ranking is relative and used mainly for instant relevance grouping of search results, which enable much higher scalability than 400 billions in practical applications for fast and effective search in large databases; all the other capacity limits are determined only by available hardware disk space and memory;
  • Big data (large databases): provided enough hardware servers, networked into a specially custom hard-wired tree-like network topology, the Clusterpoint database platform software scales up to 1000s of servers; it has been designed form ground-up for this type of maximum scalability, yet, to make such cluster configurations to operate efficiently and without network delays, causing unacceptable cumulative effects in clusters with large number of linearly networked servers, the Clusterpoint software must be adopted to specific tree-hierarchy of hard-wired servers in such a networked IT infrastructure, in order to minimize network hops for short data data transfer transactions common in Clusterpoint cluster, restricting Clusterpoint Server software to communicate only within single top-down tree segment of servers; the resulting hardware and software configuration can still guarantee sub-second query times for a very big database split among thousands of servers;
  • Fast search:  sub-second query response times are standard and can be guaranteed to our customers for all Internet-style ad hoc queries in large cluster databases, when Information Ranking mechanism is activated; the core Clusterpoint database engine has been developed in fast C/C++ source code, and optimized for speed on modern hardware; it uses the best industry practice to speed up data retrieval such as in-memory caching, predictive read-ahead (speeding up all following transactions in multi-page browsing), pre-sorted indexes, automatic entire database indexing during updates, including creating and updating full text index, etc.;
  • Easy to integrate: there is no need to install or pay for database vendor client software; Clusterpoint API is fully open API based just on simple XML and web http/https protocol messaging.  Virtually any programming language or application development framework can start "speaking" with Clusterpoint database server, using built-in tools for web messaging and XML object parsing.  Our customers do not need to learn new programming language or application framework - they can use their own favorite or in house software system, to start developing or modifying their applications to work with Clusterpoint XML-only database server.  This simplicity also guarantees complete interoperability with existing web applications: in most cases customers take advantage of Clusterpoint functionality just by using Clusterpoint XML servers in complementary way, along relational SQL database servers, and integrating into web applications only functionality for their fast growing scalable customer data sets, where SQL servers struggle to perform;
  • Multi-lingual data base: handles data objects containing text in 160 languages, storing data in UTF-8; provides server based fast code page translation tools to national encodings from UTF-8 and back; provides language specific customizable word stemming and inflection configuration facility, and facility to substitute synonyms in search queries from the customizable vocabulary for each particular storage;  also provides API options to perform fast string matches by templates of word parts, such as letters, combination of letters and word endings, which is implemented as an ultra-fast lookup into the actual database content by full text index instead of scanning through all database objects;
  • In-memory caching: use of all available computer free RAM memory to minimize disk usage;
  • Transaction logging: being a database server software, Clusterpoint Servers builds all database transaction logs and error logs;
  • Customizable runtime environment per each database: for each XML database (storage) you can specify customized performance run-time attributes and configuration parameters in the Storage Configuration file, that will be taken and applied by Clusterpoint Server to your particular storage, such as custom delimiters of words for full text indexing, maximum size of occupied memory in RAM for data and index buffers if it is necessary to limit, number of predictive read-ahead document records for better caching of disk data, and many other parameters.  Storage configuration is a small XML file itself and can be modified either from Clusterpoint Manager administration application, or through command line text editor directly accessing database storage in the same name directory file system.
  • Customizable indexing policy per each database:  for each XML / JSON database (storage) you can specify a customized application specific configuration file of indexing and data sorting preferences, called a Document Policy file, which is also a small XML file itself and can be manipulated similarly to Storage Configuration file, using either from Clusterpoint Manager administration application, or through command line text editor directly accessing database storage in the same name directory file system; Document policy file is describing by which XML tag to uniquely identify interally stored XML documents (defines Document ID tag for a server software to store, find and access documents by unique string values stored by you in your own specified XML tag).  Document policy file also can specify which XML tags should be listed in search results by default.  Probably the most important feature of Document Policy configuration file: to describe your own custom rules how to apply Information Ranking algorithm for Clusterpoint Index, based on your XML / JSON data structure.  It implements mechanism to specify you own database information ranking methods, delivering relevant and lightning fast sub-second searches even in massive databases, when users search Internet-style using simple ad hoc query keywords or phrases.  Please see sections Indexing and Information Ranking for details;
  • Database mirroring: same as Full Database Replication, please see above;
  • Database striping: same as Distributed Database, please see above;
  • Supports binary data storage:  can store full binary encoded files as parts of the Clusterpoint document into the storage and return it as saved and unmodified original data cache content;
  • Fast retrieval by Document identifier: any Clusterpoint document can be retrieved using known unique document identifier (no Clusterpoint Index is needed); can be useful during full database re indexing;
  • Document ID or document identifiers: can be any URL, unique file name, database primary key, custom sequence number, unique registration code, or other unique string value.  Used to identify and retrieve customer's original XML documents; also used in search results identifying matching documents, without reading XML document content from disk;
  • Freely extendable Clusterpoint server functionality by customized Lua scripts: through server-side Lua scripting (Lua is an open source free software programming language) our customers can design and develop millions of extra functionalities for Clusterpoint Server, without waiting for next Clusterpoint Server software releases; Lua is also very simple to learn and compiles to very fast and speedy binary byte code; any customer Lua scripts through configuration file plug ins (we call  - hooks) can be made a part of the Clusterpoint Server C/C++ core engine; Lua is one of the fastest compiled scripting code language on the market, it produces high-performance embeddable byte code; Clusterpoint customers can hook into the server code any Lua driven custom extra functionality that will be executed server side, before or after any Clusterpoint API command processed by server, for example, implement any custom database triggers or stored procedures for their business applications, drive their own asynchronous messaging and alerting systems, develop database search or update events-based notifications, even hook through Lua plug ins their own external applications (e.g.,artificial intelligence and machine learning, business analytics, reporting etc.) , that are invoked server side, or directly through Clusterpoint API as Lua plug ins.  If you have not heard about Lua before, please note it has long been used by game programming software developers as scripting language of choice and has been recently selected by Wikipedia as their future programming language of choice for scripting of WiKi templates.  This Clusterpoint Server feature of user scripts based functionality extensions is absolutely powerful feature, yet please use it carefully and test any Lua scripts extensively before use - you can easily crash Clusterpoint Server with badly written Lua scripts.  Please see User Scripting;  Please note that we also offer to our customers our development services for customizable C/C++ functionality plug-ins similarly to what customer Lua scripting does, but this requires our custom development service to transform all code from a customer Lua script language to C/C++ source code, then properly integrate it with Clusterpoint Server transaction and error logging, extensively test those custom ordered customer C/C++ modules, and optimize that new custom C/C++ server extension code to work at the maximum hardware, network and storage supported performance levels; then we deliver customized Clusterpoint Server software release for a particular customer who ordered extensions, please see our Services.  So we encourage our customers initially to develop and prototype test in Lua their custom required database server functionality extensions, then, when tested and proven in operations using Lua scripting language, they can use our custom development services to transform all or part of their Lua code functionality into much more fast C/C++ code and then we will deliver that Clusterpoint Server customized version for the customer production use with required extra customer functionality built-in and running at the maximum possible execution speed, without byte code induced performance limitations;back

Data Storage and Update Features

  • Document insertion: add new XML / JSON documents;
  • Auto increment of Document IDs: option to automatically increment Document ID for new database objects inserted;
  • Document replacement: modify existing XML / JSON document by a known Document ID;
  • Partial updates: modify existing XML / JSON document parts only, avoiding the need to transfer full documents to customer application software and then rewrite them back after modifications into Clusterpoint database;
  • Document updates: add new document or update existing one;
  • Document partial modification: replace specified XML / JSON parts of the document, without rewriting the whole documents;
  • Deletion of documents:  delete a document form the database and index;
  • Deletes by search: combine versatile Clusterpoint 'search' command with delete operation, which is useful for database administration;
  • Document locking: API driven document locking supported for multi-user database update environments;
  • Database reindexing: force full database reindexing without re-loading of all XML / JSON documents, after hardware failures or index-affected Document Policy changes;
  • Database emptying:  delete database content, saving Document Policy and Storage Configuration;
  • Database deletion: delete Clusterpoint storage and database permanently;
  • Document retrieval: retrieves customer originally stored XML / JSON document as is;
  • Document retrieval with policy attributes: retrieves customer originally stored document with information ranking policy attributes to be applied for each specific XML part shown as extra XML attributes for each tag;
  • Document lookup:  Lookup of particular documents presence in the database without the full content read, useful for performance reasons;
  • Document listing: retrieves of document IDs only, without full document content, for performance and database administration needs;
  • Flexible document storage workload distribution: automatically distributes documents among less used and more free storage having cluster nodes;
  • Programmable document distribution control in cluster: for applications requiring specific logic for distribution of document storage per cluster nodes, each cluster node can be addressed and accessed separately from Clusterpoint API;
  • Multiple-document updates:  Send to server through API multiple concatenated XML / JSON documents for updates (does not require separate HTTP request, saving performance time), for example, group all update transactions for different objects in the database in the same single HTTP request;
  • Fast document batch-uploads: option to upload massive data portions as a string of concatenated XML-documents, in batch files of such documents, for server's background database loading with data and indexing on the database server's local file system (e.g., over FTP or external storage device), and using Clusterpoint command-line utility, which feeds the batch files into the Clusterpoint Server directly, without API transactions over slower http protocol;  there can be 1000s or 10,000s documents per batch file; each batch file is processed in background and API command 'status' command can be used to monitor  indexing status of a particular storage during such massive uploads (can take longer time compared to transactional updates, during which new batch-uploaded documents will not be available in the database until indexing will finish processing of each document in the batch file). back

Database Search Features

  • Full-text search engine: enterprise search functionality software is the core part of the Clusterpoint database server software, it is integrated into the database storage, and automatically builds full text search index for any XML / jSON documents stored into Clusterpoint storage;
  • Ultra-fast speed:  up to 18000 search transactions per minute (up to 300 per second) using memory cached data and up to 1800 search transactions per minute with disk access;
  • Full RAM use: use of all available computer RAM memory to minimize disk usage;
  • Fast ad hoc search: sub-second response times using simple Internet-style database search queries, entering just any known keyword as query terms - and still getting the most relevant results out of the database on the first web page of your application; the feature works without performance loss in massively distributed databases providing low sub-second response times (<0,2 seconds with disk access, and <0,005 seconds with in-memory database), when customer Information Ranking is applied to Clusterpoint database Indexing.  One of the most powerful features of Clusterpoint database technology.  Unlike some ranking systems which are closed and proprietary, Clusterpoint Server software provides open and fully flexible mechanism to customize your own database information ranking according to your own business rules; for more information please see our Web site section Information Ranking;
  • Phrase search: use phrases as in Internet-style search; useful in many cases to quickly narrow down search results within a large database, where the phrase appears only within certain fields or parts of XML / JSON document;
  • Boolean search: combine your search queries with AND, OR, NOT logical operations;
  • Multi-level parentheses: combine ad hoc word and phrase search, structured search, Boolean expressions to develop and  execute powerful search queries with complex logic;
  • Wildcard support: Use of word wildcards in queries;
  • Wildcard tuning options: Option to configure wildcard expansion coverage for performance needs;
  • Stemming support: Word stemming for multi-lingual data is supported, with a customizable module for programming word inflection rules for a particular language;
  • Stemming tuning options: option to configure stemming expansion coverage for performance needs;
  • Proximity search: search terms within N words of another word, specifying relative distance N, works also in XML structure;
  • Case support:  search only for proper capitalized names, discriminate search results based on case;
  • Stop words detection: detects frequently used short words with an automatic exclusion if not mandatory required, for performance tuning on large data sets;
  • Customizable delimiters: use of any special symbols in search terms if not specified as word separators;
  • Results grouping: can be grouped for domain, zone, returning only the N first results with link to others;
  • Numeric and date search: combine any integer, date or float values with full text ad hoc queries;
  • Geo spacial search:  results sorting by distance for GPS-based location search applications, maps etc.;
  • Customizable ordering of results: for numeric range and date search in ascending, descending order;
  • Distributed search: search is performed and results merged from multiple parts of the same database on all cluster nodes, storing a cluster database;
  • Scalability without search performance loss:  scales for search in hundreds of millions of documents (in cluster mode);
  • Structured search in XML data fields or JSON attributes: search within specific XML / JSON document tag data ;
  • Unstructured search: search across all indexed data in all specified XML fields, using full text index;
  • Faceted search feature: allowing to narrow/expand search results for any categorized XML / JSON tags (facets), defined for a particular storage, enabling to return actual facets with number of hits per facet; and used for extra navigation;
  • Predictive hit calculation: returns approximate expected number of hits for huge databases, using statistics, to let users know how large data set they can expect after receiving the first web page with results;
  • Spell-check using actual database content:  misspelling and mis-typing detection and correction using alternative words from vocabulary, enabling "Did you mean that ....?" functionality for customers corporate database web applications;
  • Option to configure spell-check: modify level and expansion coverage of alternatives;
  • Search using pre-sorted index: fast ad hoc search results using pre-sorted document ranking or query relevance at the index level, directly reading from disk, eliminating the need to sort data upon each search query;
  • Flexible information relevancy definition: search results ordering by data items relevance with flexible relevance definition mechanism for a database: using relative information weights and document rank values, making a foundation for a Clusterpoint database Information Ranking, which is applied during indexing;
  • Interval search: search within any interval of numeric values or dates;
  • Relevance filtering: search only within document title, content or other parts defined by relevance ranking, effectively filtering out irrelevant or less relevant data; 
  • Text snippets: returns snippets (small fragments of text) for search query results around hit terms, for required XML tags (for example, text articles);
  • Hits high-lighting in snippets:  search terms highlighting in text snippets;
  • Customized high-lighting: option to configure highlighting with specific start and end tags for better display;
  • XML / JSON formatted messaging: query and search results are either in XML or JSON, easy to parse and integrate;
  • Similar document search by content:  for texts containing databases, finds other documents, statistically with some probability matching a given text content;
  • Web-applications oriented API for efficient search:  enable to build user-friendly search interfaces; to define maximum number of documents per result page in every search query; to define starting document number for multi-page results sets in search query; to calculate and return total number of hits in every page of multi-page result sets; to return total query search time spent by the engine, to restrict maximum number of documents in any search result for performance needs etc.
  • Extensively documented search API:  provide excellent search results customization options
  • Trouble-shooting search API: search API commands can be performed from Web based administration tool in Clusterpoint Manager, without programming;
  • Alerts on full content:  alert events triggered by content updates using full text filtering expressions;
  • Customizable programming of alerts: support for definition, modification and removal of full content matching alert filters, using Clusterpoint API to set up, delete or execute alert filtering for a particular documents. back

Database Indexing Features

  • Clusterpoint Index:  core indexing mechanism for Clusterpoint Server database software, which is a combination of inverted (full text) index, a B-trieve type index for storing numeric and date values, and a RAM-based graph-database tree-like index accommodating all unique database elements, including any strings, labels, numbers, dates, emails, relations, references and anything which can be partitioned into small elementary string elements; we sometimes call it an "atomic" index; we decided that there is no reason not to index everything possible, taken into account abundance and cost of storage space, CPU and memory;
  • Large-files protection:  does not have limit of the maximum size of database and index size (data is stored in 50Mb container files).  One storage can span hundreds of gigabytes per one computer.  Database and index scales to petabytes if data is distributed among cluster of N computers;
  • Pre-sorted index: the system builds and maintains pre-sorted document ranking or query relevance at the index level, directly reading from disk, eliminating the need to sort data upon each search query and thus achieving extremely fast search performance even with complex data ordering and groping rules;
  • Flexible information relevancy definition on the index level: Clusterpoint database indexing mechanism  is based on flexible and customizable relevance definition mechanism for a database content and parts of XML / JSON documents: using relative information weights and document rank values.  This makes a foundation for developing and deploying any custom Information Ranking, which is applied to Clusterpoint XML-/JSON-only database during indexing;
  • Full text index:  we automatically build the traditional full text index known from enterprise search world, for any XML data item that is present in the Clusterpoint database; it is used for ultra-fast search for anything which is known to be present into the database, even dates and numbers can be instantly found using just string notation;
  • XML/JSON structure indexes: we automatically build fielded or structured indexes for ALL tags in custom XML / JSON document; enabling search only within certain specified in search query XML / JSON tags;
  • Numeric and date indexes: we automatically build indexes for all numeric and date containing fields in any custom XML / JSON document; enabling interval search similar to SQL-world;
  • Virtual meta data indexes: we support XML / JSON markup syntax for creating special "virtual" tags, which are present in Clusterpoint Index, created from existing document XML / JSON field values, but not present in customer XML / JSON data structure; useful for different technical needs, such as combined search across several fields, or types of objects etc.
  • Hidden indexing: supports indexing of XML / JSON document parts as "hidden" document content.  Can store document specific XML / JSON tags content as hidden, which can be searched for, but are excluded from search result text snippets and are not part of the original document content; this is useful if customers create some consolidated XML data fields for customized search needs, but which they do not want to be included into snippets (e.g., technical data assisting to better find the content such as substitutions, abbreviations etc.);
  • Index exclusion:  Supports exclusion of XML tags from search index.  Customers can store document specific XML / JSON tag content for later result formatting or other needs which are not indexed from the content point of view (and are not searchable); useful also for saving disk space eliminating unnecessary full text indexing;
  • Customizable delimiters: flexible configuration of special symbols for separation of the smallest index elements in our "atomic" index structure
  • Customizable performance per storage: can specify custom indexing cache size and memory usage limits for each database for performance tuning needs;
  • Trouble-shooting indexing:  Clusterpoint API commands 'update', 'insert',  'delete', 'replace' and 'reindex' which also modifies database index, can be performed directly from the Web based administration tool using Clusterpoint Manager. back

Centralized Management Features

  • Clusterpoint Manager Application:  a Web-server based administration, configuration, monitoring and access security management utility, with an easy to use web user interface, enabling to use any standards compliant Web browser to administer and control remotely or locally all Clusterpoint Server installed hardware computers;
  • Centralized management:  all Clusterpoint servers, databases and clusters across the corporate network can be managed centrally, with a single-sign in for authorized administrators; 
  • Multiple administrator accounts: uses multiple password protected administrator accounts to access Clusterpoint Manager at different levels of DBA or sysadmins rights; access rights to work with Clusterpoint Manager can be limited to view and administer only particular storages, effectively partitioning access to databases based on security credentials, need to know, splitting development access from production etc.;
  • Database virtualization support: management of multiple named databases (storages) per single hardware server, for different customer applications, with different sets of users for each one, running them in parallel on the same hardware, yet administered and monitored by data center personnel through Clusterpoint Manager application;
  • Management of cluster databases:  supports management of clustered configuration of a named data storages (same name N database shards, stored as 1/Nth of the total database content among N cluster nodes, together making a single logical cluster storage), distributed among multiple hardware servers in a networked cluster;
  • Management of full database mirroring: supports management of the same name database mirroring into multiple copies, running on different hardware nodes in a clusters, and automatically performing full database replication and updates by Clusterpoint Server; enables to create additional database mirror copies, remove them, synchronize them manually etc.
  • Remote control of database services:  status control, startup and shutdown of named storage servers; each Clusterpoint database storage is being served by the dedicated Clusterpoint Server instance, running in RAM and securely separated from all other server instances servicing other storages;
  • User administration: multiple end-user API accounts with different access rights to Clusterpoint server storages, in groups of access rights, up to API command level for a particular storage and user, for example, restricting user only for database status monitoring, or search;
  • Centralized log file analysis and error handling:  for each storage all transaction log files and error files are centrally accessible through Clusterpoint Manager interface, can be searched, viewed and inspected;
  • Storage Configuration management:  enables to manage configuration file options for each named document storage within a user-friendly Web form describing meaning of the configuration parameters;
  • Document Policy management:  enables to manage Document Policy configuration file options for each named document storage within a user-friendly Web form describing meaning of the configuration parameters; enabling to assign relative relevancy weights for parts of custom XML document; define Document ID tag for identification and retrieval of stored XML documents; assigning indexing defaults for specific customer defined XML tags etc.;
  • Cluster-wide configuration changes: enables one-click application of any Storage Configuration or Document Policy changes for a cluster storages, distributed among large number of cluster nodes; automates laborious and repetitive tasks when changing configurations by DBA in massively clustered IT infrastructure environment;
  • Command line support:  all configuration files for storages (databases) are stored in easy to edit XML format, under the same name directory as a named storage; customers can use common command line tools to open and modify the Clusterpoint Server configuration for any storage, without using Clusterpoint Manager; still many folks prefer this way of system management;
  • Simplified location of all files:  Each database (storage) configuration files, all stored database documents and all indexes are separated in its own disk directory of the same name as the named storage, making system administration easy and understandable requiring just basic sysadmins skills and knowledge of basic file system operations;
  • Built-in web interface module for running API commands :  for executing individual Clusterpoint API commands in any storage directly from Clusterpoint Manager Web interface, without programming; useful for DBAs to quickly check data integrity, search functionality, database status and many other things;
  • Statistics Dashboard: for viewing and quick filtering of log files, viewing totals of types of transactions etc.;
  • SNMP management agent:  for Clusterpoint Server status checks using common standards-based network management systems. back

Security and User Administration

  • User authorization: with user name and password;
  • Security partitioning: restrict user access for specific storages within corporate network;
  • Different access rights: each user access can be limited in every storage for only specific Clusterpoint API commands;
  • Encryption: option to encrypt traffic between client (application) and server using SSL;
  • Transaction Queue Protection: engine based filter for blocking denial-of-service attacks; sustaining heavy query workloads;
  • Audit log: all management operations by DBAs performed using Clusterpoint Manager, are logged;
  • Transaction log: all search queries and indexing transaction results and errors are logged for each storage;
  • Trouble-shooting support: use of unique identifiers and timestamps for debugging and tracking of transactions;
  • Automatic rotation of log files: to prevent data loss because of a too large log file size;
  • Chronological log files: log files are organized by dates to ease backup, debugging and administration tasks;
  • Database integrity controls: built-in automatic data integrity controls to prevent data loss in case of an unexpected server shutdown. back

Documentation and Code Samples

  • Developers Guide: an interactive WiKi-based resource;
  • Community Forum: a searchable for problem solving tips, technical support recommendations and answers on frequently asked questions;
  • Sample client code for Clusterpoint API: for C, Java, PHP, .NET, Perl. back

Cross-platform Availability

  • Operating systems: installation package suitable for any Linux distribution, tested on most popular Linux distributions: RedHat, SuSE, Slackware, Debian, Mandrake.
  • Customization: optional: custom installation service for other Linux distributions available;
  • Out-of-the-box installation with OS: ISO image, i.e., Clusterpoint Server software installation package;
  • Free demo download:  fully functional evaluation software for test-driving Clusterpoint Server; a full-fledged  cluster version software distributed under Clusterpoint Enterprise Evaluation License, 60-day free trial; afterwards can be upgraded to permanent Enterprise license without re-installation, over Internet, using supplied by us license activation keys of the products, which can be done using free Clusterpoint Manager application;
  • Installation on customer hardware: turn-key installation solution on a customer hardware -  customer's choice of Linux distribution, Clusterpoint Server software and our technical support, including remote installation services, configuration and problem solving over Internet, please see section Technical Support;
  • Licensing of the source code of the Clusterpoint Server: the core Clusterpoint database engine was developed in C and C++ for portability across different operating systems and to achieve maximum speeds even on low-end hardware; can be ported to the requested operating system and custom usage requirements;
  • Cross-platform database storage:  Clusterpoint Server database storage files (configuration, document storage, index and log files, together making a complete named storage, all located under the same name directory) are cross-platform compatible between different operating systems: Linux, FreeBSD,  MacOS, or Windows, and does not require database migration or re indexing.  All Clusterpoint XML-database storage files (data, index, log and configuration files) can be simply copied onto the new operating system platform server and and can be served by Clusterpoint Server software compiled and run natively by that particular operating system product (); this portability of database storage across platforms enable to adjust hardware and system IT infrastructure to the customer most productive and economic solution for Clusterpoint database management;
  • Uniform cross-platform licensing terms:  Clusterpoint Server is being licensed per server only, the same license can be applied to any hardware and any operating system;  although we have software products for different operating systems, it is not necessary to license Clusterpoint Server for a specific operating system; you can use different Clusterpoint Server software products for several OSes, for example, if you run in parallel Linux and MacOS servers and need both Clusterpoint Server products (compiled code for both OSes), still you pay only license for total number of servers;  in this way you can easily change operating system without paying extra cost for that particular operating system database license. back

Hardware Requirements

  • Processor architecture: 64-bits; optionally Clusterpoint Server database engine software can be compiled for 32-bit systems to run on older hardware;
  • Minimum CPU speed: 1GHz  per server;
  • Recommended entry level CPU speed: 2GHz, multi-core, per server;
  • Minimum RAM: 512MB RAM per server;
  • Recommended entry level databases (up to 50GB) RAM: 2GB RAM per server;
  • Recommended for large (>50GB) datasets RAM:  > 4GB RAM per server;
  • Recommended disk subsystem: RAID 0/1 enabled SCSI or SATA hard disks or SSD, min. 7200 rpm;
  • Recommended disk subsystem for high-speed database I/O: SDD or high-speed HDD arrays; 
  • Minimum networking support: a single Ethernet 10/100Mbps interface port;
  • Recommended networking capacity: 2 x Ethernet 100/1000 Mbps ports (can be used to additionally partition management and access security on different network segments, also useful for call-in remote diagnostics);
  • Recommended database configuration for very large (>500Gb) datasets: distributed search database running on multiple servers in Clusterpoint Server cluster configuration.  Then N cluster nodes of inexpensive commodity server hardware equipment can each run 1/Nth of the database and Clusterpoint Server software installed on all N cluster nodes will provide consolidated use of all CPU power, RAM and disks storage to a particular database;
  • Supports virtualization: Clusterpoint Server can run into any virtualized environment, and you can easily manage all your Clusterpoint Servers and databases in a cluster, setting up and operating complete Clusterpoint database platform infrastructure under any virtual machines based cloud IT infrastructure;
  • Uninterruptible Power Supply: recommended for all hardware servers; although Clusterpoint Server has a built-in database integrity control and in most cases can automatically recover from equipment failures; the Clusterpoint Server core software is designed using many modern database performance optimization methods, such as buffering of data and index updates into RAM, pre-emptive read of data and index into RAM for speeding up multi-page browsing of databases in web applications, and using other advanced memory caching and disk buffering schemes; all database server instances servicing their particular storages must be started and shut down as servers, and having UPS helps to protect database integrity; we are not different from any other database server system: you decide if you need extra protection with UPS for your mission-critical database server systems. back