Customizable index ranking for the best search relevance and linear database scalability:
being probably the most disruptive feature of our database software technology, it enables our
customers to uniquely rank all content in their databases
for custom relevance rules at database configuration level;
this search customization functionality is delivered by Clusterpoint
Server for any database out-of-the-box, so that any database search
query always returns results precisely positioned, grouped and ordered
by those customer rankings, in the most useful and meaningful
(relevant) way for end-users; ranked index also guarantees
that query response times will be well below low fractions of a
second even when querying data from very
large distributed (clustered) databases. This ranked indexing
method, once configured at database configuration level by our
customer, is then automatically applied to full database content by
Clusterpoint Server software and automatically maintained for all
database updates in real-time; we call this freely programmable
database content ranking mechanism "information ranking" as it makes
information (data that is valuable, relevant for users) search simple,
fast and natively
scalable in cloud IT concept, almost an ideal
indexing model for interactive web applications. Please see Information Ranking;
Navtive data store clustering:
distributed XML-/JSON- only database architecture with a built-in
enterprise search functionality, including fast full text
search; Clusterpoint data
storage
engine and built-in database search engine both supports
generic clustering architecture, providing high speed indexing and
search performance for XML / JSON data independently of the total
database
size. The system can be gradually scaled out by adding
hardware
cluster nodes incrementally, without affecting negatively overall
database system performance; Please see Clustering.
Energy-efficient NoSQL database:
designed for private and public cloud IT architecture, that uses inexpensive
commodity networked hardware with modest energy-efficient computing requirements;;
Elasticity:
any Clusterpoint database can be effortlessly scaled out from a single
server to massively clustered system distributed among hundreds of
servers (cluster nodes); with hierarchal hardware/network setup
scaling out even to thousands of machines and billions of database
objects without losing fast sub-second database ad hoc search
performance. Any database storage capacity increase of performance
workload split among mirror copies can be performed by operational
department, flexibly adding new cluster nodes and scaling out database
services incrementally at data center level, without any changes to
application software. Unlike many other NOSQL software and cluster file
systems with distributed data storage functionality, Clusterpoint DBMS
in engineered to provide tightly controlled and planned clustering
functionality with precise control of computing and storage location
for clustered database parts and database replication copies
(mirroring); all clustering and mirroring options are managed and
centrally controlled through Clusterpoint Manager, providing level of
exact control of database storage location and database capacity
planning that are normally required by responsible enterprise
customers. Please see Management;
Cross-platform:
store, access and search any custom data objects designed in industry
standard schema-less XML data;
Key-based for storage, document-oriented for search:
undivided XML / JSON data objects stored by unique id key, a
machine-readable database model easily understandable also by humans;
for data storage and basic retrieval it works as a simple key-value
data store, however for search it works as a structured data store
enabling to perform combined structured, semi-structured and
unstructured search queries that uses customer own data structure
to select, narrow down or expand set of data objects retrieved based on
customer own criteria (much like any SQL SELECT clause does).effectively manages undivided XML objects, including
completely
de-normalized data, providing high speed search within entire database
or within a specific XML sub-structure, or combination of both;
Fully indexed:
instantly searchable XML-/ JSON- only database, automatically build full text
search index, does not require to integrate your legacy database
software with enterprise search tools to achieve the same search speed
and functionality;
Full-text
search:
integrated enterprise search engine is built-in as part of the database
server software, having a rich set of search options and web developers
friendly API with paging, hit prediction, faceted search,
fuzzy
search
for "Did you mean that ...?" functionality and many other features,
please see below subsection Database Search Features and how
Clusterpoint search was implemented at system level in section Technology / Search;
Open API:
based on XML / JSON and HTTP/HTTPS, using the REST design principles;
the
Clusterpoint API messaging protocol is openly published and
inter operable with any other programming environment;
Client-server
database:
database server architecture fits model of multi-tier servers
(web,
application and database servers), serving as the back-office data
storage and management software platform to store, access and search
customer XML data at high speed, in a distributed and massively
scalable database architecture; and servicing customer web applications
through Clusterpoint API;
High
performance database:
server software is designed and developed in fast C/C++ source code to
run natively under most popular hardware servers operating systems;
Distributed
database:
Clusterpoint server software works as transparent cluster software, no
master software is necessary, providing no single point of failure
availability for cluster database operations; any failed cluster node
would not affect other cluster nodes, and depending on clustering
configuration, can provide full database redundancy by running multiple
production copies of database in parallel on different hardware
(database mirroring), or a high availability cluster database system
where each of N cluster servers is storing and managing 1/Nth of
database total content (database striping);
Native XML
/ JSON data storage: can store any arbitrary
XML- / JSON- formatted data object as a schema-less
XML/JSON document into the database (that is why we call any database 'a
storage'), retrieve it back, update it or delete, and search and access
with Clusterpoint database server various API commands. XML- and JSON- are an industry standard supported by most software
platforms today, and a very flexible data format, which can
accommodate
virtually any other data structures: texts, database records, tables,
object hierarchies, linked data, even binary data files such as video,
audio (through Bin64 encoding of XML tags);
Structure
agnostic data storage: unlike legacy relational
systems with fixed data structure, the
Clusterpoint Servers handles all XML documents the same way,
irrespective of their internal structure; a feature enables to store
totally different structure data objects in the same searchable
database, applying just common meta data and unique
identification of
all documents to manage data storage; it is also a flexible
database
model - you can change or expand any data objects as necessary for your
applications, just by rewriting your XML documents into the
Clusterpoint after structural modifications adding new field, or
changing existing ones: no need to migrate complete database due to
structural changes as in SQL world;
Full database
replication support:
run a production database in fully synchronized multiple
copies in a cluster (we call 'mirror databases'), where synchronization
of updates among cluster nodes is taken care by Clusterpoint Server
software, no need to program it into the application software;
can be
useful for building and running multiple identical active on-line
backup databases, that can also be used for search and access
load-sharing by different web applications (for example, a web portal
with millions of users can effectively split search and data retrieval
workload among as many database copies as necessary to handle all users
at high performance);
Real-time
updates:
database is updated in real time, just by sending any custom XML
/ JSON documents over Clusterpoint API protocol; full-text index is
automatically updated upon each update transaction, no delays or
full-text index inconsistencies which is a problem for many enterprise
search systems;
Reliability:
follows ACID transaction model - all database transactions are "atomic"
transactions working with complete and undivided customer XML / JSON documents
as basic data objects. Transactions are either being fully
written to
the database, and in this way committed automatically, or generates and
error. There is no need to program complex transaction
management what
multi-table relational systems with relations require in Clusterpoint
database architecture;
Parallel
indexing/search:
the core database server software is designed for contemporary hardware
architecture using multiple processors and multi-core CPUs; the
software is using multi-threading to effectively parallelize search and
indexing processes; and is configurable for each database, to match
available hardware capacity and customer performance requirements;
Virtualization
support:
the database server software platform was designed to be run in
separate different memory address spaces for separate databases,
effectively enabling to run multiple Clusterpoint Server instances on
the same hardware, each for its own separate database or applications,
with its own users and access security; it is an ideal design also for
contemporary virtualized environments, where different customers can
securely share the same hardware resource using virtual machines;
one
can even set of a large database cluster emulation using just virtual
machines as cluster nodes on the same hardware, and using Clusterpoint
Manager utility to configure and run complete database as a distributed
database - great feature for testing and development;
Enterprise
Security:
User authentication and authorization is based on access groups
&
roles, using Clusterpoint Manager utility, for each database storage
can be assigned its own set of users, with different privileges on
database modification and search (up to the particular API level
commands); supports enterprise LDAP and ADS if necessary, however, as
Clusterpoint Server is normally accessed only by DBAs, application
software developers and application software over API, which in most
cases already handles end-user logins and authorization in their own
customer
applications (normally by an application server), the end-user
authentication subsystem could be also switched off to increase
performance of customer web applications (for example, if Clusterpoint
database is used only in the 3rd-tier data storage level, after
web-server and application server, having their own access controls and
user authentication logic);
Centralized
management:
web-server based user interface utility Clusterpoint Manager is being
used for centralized administration and configuration of all
Clusterpoint Server installed hardware servers enterprise-wide.
DBAs
or system administrators can use any Web browser, to log in and
administer remotely the whole Clusterpoint database platform
infrastructure within an organization: all storages (databases), all
servers (active databases served by Clusterpoint Server instances), and
all clusters. For cluster databases Clusterpoint Manager
provides a
single-click cluster wide configuration changes;
Rich API
functionality:
there is more than 140 database storage, access and search options
available for application developers through our Clusterpoint API.
The
key API command set is relatively small and easy to learn.
Yet
the most
API commands are highly customizable with extra options and attributes,
which we constantly add and improve based on our customer feedback.
Using Clusterpoint API our customers are building
full-fledged
database applications, replacing more complex and time-consuming
development for relational database SQL-based systems, with a quick and
agile development for web applications using only Clusterpoint XML-only
document database storage concept;
Combined
queries:
Clusterpoint Server is focused on database search: you can do
full-text, structured, semi-structured,
numeric, date, geo-spacial
search etc. You can combined any search query, in particular,
full
text (ad hoc) and structured search using Clusterpoint API and
Information Ranking, to quickly deliver or add to your corporate
databases instantly responsive end user-friendly Internet-style search
functionality. With simple search keywords as queries and ranked
database information you can deliver for the first web page of your
database search application always the most relevant database
information. Simplicity, ultra-fast sub-second speed and
relevance of
database search enables to achieve the best user search experience for
your web applications using Clusterpoint XML-only database platform;
Capacity:
there is no specific limitation for a database size or number of XML
/ JSON data objects managed on the Clusterpoint database platform.
Clusterpoint database can scale from hundreds of gigabytes on
a single
server, to many petabytes in a large cluster; Information Ranking is
designed to scale over 400 billion data items per database
(a data
item, for example, can be a Title field for a news article, a person
name field in address book, or a customer name in a CRM application),
which is how many unique values you can assign to your XML data items,
in reality information ranking is relative and used mainly for instant
relevance grouping of search results, which enable much
higher scalability than 400 billions in practical applications
for fast
and effective search in large databases; all the other capacity limits
are determined only by available hardware disk space and memory;
Big data
(large
databases):
provided enough hardware servers, networked into a specially custom
hard-wired tree-like network topology, the Clusterpoint database
platform software scales up to 1000s of servers; it has been designed
form ground-up for this type of maximum scalability, yet, to make such
cluster configurations to operate efficiently and without network
delays, causing unacceptable cumulative effects in clusters with large
number of linearly networked servers, the Clusterpoint software must be
adopted to specific tree-hierarchy of hard-wired servers in such a
networked IT infrastructure, in order to minimize network hops for
short data data transfer transactions common in Clusterpoint cluster,
restricting Clusterpoint Server software to communicate only
within single top-down tree segment of servers; the resulting
hardware
and software configuration can still guarantee sub-second query times
for a very big database split among thousands of servers;
Fast search:
sub-second query response times are standard and can be
guaranteed to
our customers for all Internet-style ad hoc queries in large cluster
databases, when Information Ranking mechanism is activated; the core
Clusterpoint database engine has been developed in fast C/C++ source
code, and optimized for speed on modern hardware; it uses the best
industry practice to speed up data retrieval such as in-memory caching,
predictive read-ahead (speeding up all following transactions in
multi-page browsing), pre-sorted indexes, automatic entire database
indexing during updates, including creating and updating full text
index, etc.;
Easy to
integrate:
there is no need to install or pay for database vendor client software;
Clusterpoint API is fully open API based just on simple XML and web
http/https protocol messaging. Virtually any programming
language or
application development framework can start "speaking" with
Clusterpoint database server, using built-in tools for web messaging
and XML object parsing. Our customers do not need to learn
new
programming language or application framework - they can use their own
favorite or in house software system, to start developing or modifying
their applications to work with Clusterpoint XML-only database server.
This simplicity also guarantees complete interoperability
with
existing web applications: in most cases customers take advantage of
Clusterpoint functionality just by using Clusterpoint XML servers in
complementary way, along relational SQL database servers, and
integrating into web applications only functionality for their fast
growing scalable customer data sets, where SQL servers struggle to
perform;
Multi-lingual
data base:
handles data objects containing text in 160 languages, storing data in
UTF-8; provides server based fast code page translation tools to
national encodings from UTF-8 and back; provides language specific
customizable word stemming and inflection configuration facility, and
facility to substitute synonyms in search queries from the customizable
vocabulary for each particular storage; also provides API
options to
perform fast string matches by templates of word parts, such as
letters, combination of letters and word endings, which is implemented
as an ultra-fast lookup into the actual database content by full text
index instead of scanning through all database objects;
In-memory
caching: use of all available computer free RAM memory to
minimize disk usage;
Transaction
logging: being a database server software, Clusterpoint
Servers builds all database transaction logs and error logs;
Customizable
runtime environment per
each database: for each XML database (storage) you can
specify customized performance run-time attributes and configuration
parameters in the Storage
Configuration file, that will be taken and applied by
Clusterpoint Server to your
particular storage, such as custom delimiters of words for full text
indexing, maximum size of occupied memory in RAM for data and index
buffers if it is necessary to limit, number of predictive read-ahead
document records for better caching of disk data, and many other
parameters. Storage configuration is a small XML file itself
and can
be modified either from Clusterpoint Manager administration
application, or through command line text editor directly accessing
database storage in the same name directory file system.
Customizable
indexing policy per each database: for each XML
/ JSON database (storage) you can specify a customized application
specific configuration file of indexing and data sorting preferences, called
a Document Policy file,
which is
also a small XML file itself and can be manipulated similarly to
Storage Configuration file, using either from Clusterpoint
Manager
administration application, or through
command line text editor directly accessing database storage in the
same name directory
file system; Document policy file is describing by which XML
tag to
uniquely identify interally stored XML documents (defines Document
ID tag
for a server software to store, find and access documents by unique
string values stored by you in your own specified XML tag).
Document
policy file also can specify which XML tags should be listed in search
results by default. Probably the most important feature of
Document
Policy configuration file: to describe your own custom rules
how to
apply Information Ranking algorithm for Clusterpoint Index, based on
your XML / JSON data structure. It implements mechanism to specify
you own
database information ranking methods, delivering relevant and lightning
fast sub-second searches even in massive databases, when users search
Internet-style using simple ad hoc query keywords or phrases.
Please
see sections Indexing
and Information
Ranking for details;
Database
mirroring: same as Full
Database Replication, please see above;
Database
striping: same as Distributed
Database, please see above;
Supports
binary data storage: can
store
full
binary encoded files as parts of the Clusterpoint document into the
storage and
return it as saved and unmodified original data cache content;
Fast
retrieval by Document identifier: any Clusterpoint
document can be retrieved using
known
unique document identifier (no Clusterpoint Index is needed); can be
useful during full database re indexing;
Document ID
or document identifiers: can
be any URL, unique file name, database
primary key, custom sequence number, unique registration
code, or
other unique string value. Used to identify and retrieve
customer's original XML documents; also used in search results
identifying matching documents, without reading XML document content
from disk;
Freely extendable Clusterpoint server functionality by customized Lua scripts:
through server-side Lua scripting (Luais
an open source free software programming
language) our customers can design and
develop millions of extra functionalities for Clusterpoint Server,
without waiting for next Clusterpoint Server software releases; Lua is
also very simple to learn and compiles to very fast and speedy
binary byte code; any customer Lua scripts through
configuration file plug ins
(we call - hooks) can be made a part of the
Clusterpoint Server C/C++ core engine; Lua is one of the fastest
compiled scripting code language on the market, it produces
high-performance embeddable byte code;
Clusterpoint
customers can hook into the server code any Lua driven custom extra
functionality that will be executed server side, before or after any
Clusterpoint API command processed by server, for example, implement
any custom database triggers or stored procedures for their business
applications, drive their own asynchronous messaging and alerting
systems, develop database search or update events-based notifications,
even hook through Lua plug ins their own external applications
(e.g.,artificial intelligence and
machine learning, business analytics, reporting etc.)
, that are invoked server side, or
directly through Clusterpoint API as Lua plug ins. If you have
not
heard about Lua before, please note it has long been used by game
programming software developers as scripting language of choice and has
been recently selected by
Wikipedia
as their future programming language of choice for scripting of WiKi
templates. This Clusterpoint Server feature of user
scripts based functionality extensions is absolutely powerful feature,
yet please use
it carefully and test any Lua scripts extensively before use - you can
easily crash Clusterpoint Server with badly written Lua scripts.
Please see User Scripting;
Please note that we also offer to our customers our development
services for customizable C/C++ functionality plug-ins similarly to
what customer Lua scripting does, but this requires our custom
development service to transform all code from a customer Lua script
language to C/C++ source code, then properly integrate it with
Clusterpoint Server transaction and error logging, extensively
test those custom ordered customer C/C++ modules, and optimize that new
custom C/C++ server extension code to work at the maximum hardware,
network and storage supported performance levels; then we deliver
customized Clusterpoint Server software release for a particular
customer who ordered extensions, please see our Services.
So we encourage our customers initially to develop and prototype
test in Lua their custom required database server functionality
extensions, then, when tested and proven in operations using Lua
scripting language, they can use our custom development services to
transform all or part of their Lua code functionality into much more
fast C/C++ code and then we will deliver that Clusterpoint Server
customized version for the customer production use with required extra
customer functionality built-in and running at the maximum possible
execution speed, without byte code induced performance limitations;
Data Storage and Update Features
Document
insertion: add new XML / JSON documents;
Auto
increment of Document IDs: option to automatically
increment Document ID for new database objects inserted;
Document
replacement: modify existing XML / JSON document by a known
Document ID;
Partial
updates:
modify existing XML / JSON document parts only, avoiding the need to transfer
full documents to customer application software and then rewrite them
back after modifications into Clusterpoint database;
Document
updates: add new document or update existing one;
Document
partial modification: replace specified XML / JSON parts of the
document, without rewriting the whole documents;
Deletion of
documents: delete a document form the database
and index;
Deletes by
search: combine versatile Clusterpoint 'search' command
with delete operation, which is useful for database administration;
Document
locking: API driven document locking supported for
multi-user database update environments;
Database
reindexing: force
full database reindexing without re-loading of all XML / JSON documents, after
hardware failures or index-affected Document Policy changes;
Database
deletion: delete Clusterpoint storage and database
permanently;
Document
retrieval: retrieves customer originally stored XML
/ JSON document as is;
Document
retrieval with policy attributes:
retrieves customer originally stored document with information ranking
policy attributes to be applied for each specific XML part shown as
extra XML attributes for each tag;
Document
lookup: Lookup of particular documents presence
in the database without the full content read, useful for performance
reasons;
Document
listing: retrieves of document IDs only, without full
document content, for performance and database administration needs;
Flexible
document storage workload distribution: automatically
distributes documents among less used and more free storage having
cluster nodes;
Programmable
document distribution control in cluster:
for applications requiring specific logic for distribution of document
storage per cluster nodes, each cluster node can be addressed and
accessed separately from Clusterpoint API;
Multiple-document
updates:
Send to server through API multiple concatenated XML
/ JSON documents
for updates (does not require separate HTTP request, saving performance
time), for example, group all update transactions for different objects
in the database in the same single HTTP request;
Fast document
batch-uploads:
option to upload massive data portions as a string of concatenated
XML-documents, in batch files of such documents, for
server's background database loading with data and indexing on the
database server's local file system
(e.g., over FTP or external storage device), and using Clusterpoint
command-line utility, which feeds the batch files into the Clusterpoint
Server directly, without API transactions over slower http protocol;
there can be 1000s or 10,000s documents per batch
file; each
batch file is processed in background and API command 'status'
command can be used to monitor indexing status of a
particular
storage during such massive uploads (can take longer time compared to
transactional updates, during which new batch-uploaded documents will
not be available in the database until indexing will finish
processing of each document in the batch file).
Database Search
Features
Full-text
search engine:
enterprise search functionality software is the core part of the
Clusterpoint database server software, it is integrated into the
database storage, and automatically builds full text search index for
any XML / jSON documents stored into Clusterpoint storage;
Ultra-fast
speed: up
to 18000 search transactions per minute (up to 300 per second) using
memory cached data and up to 1800 search transactions per minute with
disk access;
Full RAM use:
use of all available computer RAM memory to
minimize disk usage;
Fast ad hoc
search: sub-second response times using
simple Internet-style database search queries, entering just any known
keyword as query terms - and
still getting the most relevant results out of the database on the
first
web page of your application; the feature works without performance
loss in massively distributed databases providing low sub-second
response
times (<0,2 seconds with disk access, and <0,005 seconds
with in-memory database), when customer Information Ranking
is applied to Clusterpoint database Indexing.
One of the most powerful features of Clusterpoint database
technology. Unlike some ranking systems which are closed and
proprietary, Clusterpoint Server software provides open
and fully flexible mechanism to customize your own database information
ranking according
to your own business rules; for more information please see our Web
site section Information
Ranking;
Phrase
search: use
phrases as in Internet-style search; useful in many cases to quickly
narrow down search results within a large database, where the phrase
appears only within certain fields or parts of XML / JSON document;
Boolean
search: combine your search queries with AND, OR, NOT
logical operations;
Multi-level
parentheses: combine ad hoc word and phrase
search, structured search, Boolean expressions to develop and
execute powerful search queries with complex logic;
Wildcard
support: Use of word wildcards in queries;
Wildcard
tuning options: Option to configure wildcard expansion
coverage for
performance needs;
Stemming
support: Word
stemming for multi-lingual data is supported, with a customizable
module for programming word inflection rules for a particular language;
Stemming
tuning options: option to configure stemming expansion
coverage for
performance needs;
Proximity
search: search terms within N words of another
word, specifying relative distance N, works also in XML structure;
Case support:
search only for proper capitalized names, discriminate search results
based on case;
Stop words
detection:
detects frequently used short words with an automatic exclusion if not
mandatory required, for performance tuning on large data sets;
Customizable
delimiters: use of any special symbols in search terms if
not specified
as word separators;
Results
grouping: can be grouped for domain, zone, returning only
the N first results with link to others;
Numeric and
date search: combine any integer, date or float
values with full text ad hoc queries;
Geo spacial
search: results sorting by distance for
GPS-based location search
applications, maps etc.;
Customizable
ordering of results: for numeric range and date search in
ascending,
descending order;
Distributed
search: search is performed and results merged from
multiple parts of the same database
on all cluster nodes, storing a cluster database;
Scalability
without search performance loss: scales for
search in hundreds of millions of documents (in
cluster mode);
Structured
search in XML data fields or JSON attributes: search within specific XML
/ JSON document tag data ;
Unstructured
search: search across all indexed data in all specified
XML fields, using full text index;
Faceted
search feature:
allowing to narrow/expand search
results for any categorized XML / JSON tags (facets), defined for a particular
storage, enabling to return actual facets with number of hits per
facet; and used for extra navigation;
Predictive
hit calculation: returns
approximate expected number of hits for huge databases, using
statistics, to let users know how large data set they can expect after
receiving the first web page with results;
Spell-check
using actual database content: misspelling and
mis-typing detection and correction using alternative
words from vocabulary, enabling "Did you mean that ....?" functionality
for customers corporate database web applications;
Option to
configure spell-check: modify level and expansion
coverage
of alternatives;
Search using
pre-sorted index: fast
ad hoc search results using pre-sorted document ranking or query
relevance at the index level, directly reading from disk, eliminating
the need to sort data upon each search query;
Flexible
information relevancy definition:
search results ordering by data items relevance with flexible relevance
definition mechanism for a database: using relative information weights
and document rank values, making a foundation for a Clusterpoint
database Information Ranking, which is applied during indexing;
Interval
search: search within any interval of numeric
values or
dates;
Relevance
filtering: search only within document title, content or
other parts
defined by relevance ranking, effectively filtering out irrelevant or
less relevant data;
Text
snippets: returns
snippets (small fragments of text) for search query results around hit
terms, for required XML tags (for example, text articles);
Hits
high-lighting in snippets: search terms
highlighting in text snippets;
Customized
high-lighting: option to configure highlighting with
specific start and
end tags for better display;
XML / JSON formatted
messaging: query and search results are either in XML or JSON,
easy to parse and integrate;
Similar
document search by content: for texts
containing databases, finds other documents, statistically with some
probability matching a given text content;
Web-applications
oriented API for efficient search:
enable to build user-friendly search interfaces; to define
maximum number of documents per result page in every
search query; to define starting document number for multi-page results
sets
in search query; to calculate and return total number of hits in every
page of multi-page result sets; to return total query search time spent
by the engine, to restrict maximum number of documents in any search
result
for performance needs etc.
Extensively
documented search API: provide excellent search
results
customization options
Trouble-shooting
search API: search API commands can be performed from Web
based
administration tool in Clusterpoint Manager, without programming;
Alerts on
full content: alert events triggered by content
updates using
full text filtering expressions;
Customizable
programming of alerts: support for definition,
modification and removal of full content matching alert
filters, using Clusterpoint API to set up, delete or execute
alert filtering for a particular documents.
Database Indexing
Features
Clusterpoint
Index:
core indexing mechanism for Clusterpoint Server database
software, which is a combination of inverted (full text) index, a
B-trieve type index for storing numeric and date values, and a
RAM-based graph-database tree-like index accommodating all unique
database elements, including any strings, labels, numbers, dates,
emails, relations, references and anything which can be partitioned
into small elementary string elements; we sometimes call it an "atomic"
index; we decided that there is no reason not to index everything
possible, taken into account abundance and cost of storage space, CPU
and memory;
Large-files
protection: does
not have limit of the maximum size of database and index
size (data is
stored in 50Mb container files). One storage can span
hundreds of
gigabytes per one computer. Database and index scales to
petabytes if data is
distributed among cluster of N computers;
Pre-sorted
index: the system builds and maintains pre-sorted
document ranking or query
relevance at the index level, directly reading from disk, eliminating
the need to sort data upon each search query and thus achieving
extremely fast search performance even with complex data ordering and
groping rules;
Flexible
information relevancy definition on the index level:
Clusterpoint database indexing mechanism is based on flexible
and customizable relevance
definition mechanism for a database content and parts of
XML / JSON documents: using relative information weights
and document rank values. This makes a foundation for
developing and deploying any custom
Information Ranking, which is applied to Clusterpoint XML-/JSON-only
database during indexing;
Full text
index:
we automatically build the traditional full text index known
from
enterprise search world, for any XML data item that is present in the
Clusterpoint database; it is used for ultra-fast search for
anything which is known to be present into the database, even dates and
numbers can be instantly found using just string notation;
XML/JSON structure
indexes: we
automatically build fielded or structured indexes for ALL tags in custom XML
/ JSON document; enabling search only within certain specified in search query
XML / JSON tags;
Numeric and
date indexes: we
automatically build indexes for all numeric and date containing fields
in any custom XML / JSON document; enabling interval search similar to
SQL-world;
Virtual meta
data indexes: we
support XML / JSON markup syntax for creating
special "virtual" tags, which are present in Clusterpoint Index,
created from existing document XML / JSON field values, but not present in
customer XML / JSON data structure; useful for different technical needs, such
as combined search across several fields, or types of objects etc.
Hidden
indexing: supports
indexing of XML / JSON document parts as "hidden" document content.
Can
store document specific XML / JSON tags content as hidden, which can
be
searched for, but are excluded from
search
result text snippets and are not part of the original document content;
this is useful if customers create some consolidated XML data fields
for customized search needs, but which they do not want to be included
into snippets (e.g., technical data assisting to better find the
content such as substitutions, abbreviations etc.);
Index
exclusion: Supports
exclusion of XML tags from search index. Customers can store
document
specific
XML / JSON tag content for later result formatting or other needs
which are not indexed from the content point of view (and are not
searchable); useful also for saving disk space eliminating unnecessary
full text indexing;
Customizable
delimiters: flexible configuration of special symbols for
separation of
the smallest index elements in our "atomic" index structure
Customizable
performance per storage: can specify custom indexing cache
size and memory usage
limits for each database for performance tuning needs;
Trouble-shooting
indexing: Clusterpoint API commands'update',
'insert', 'delete', 'replace' and 'reindex' which also
modifies database index, can be performed directly
from the Web based administration tool using Clusterpoint
Manager.
Centralized Management Features
Clusterpoint
Manager Application:
a Web-server based administration, configuration, monitoring and access
security management utility, with an easy to use web user interface,
enabling to use any standards compliant Web browser to
administer
and control remotely or locally all Clusterpoint Server installed
hardware computers;
Centralized
management: all Clusterpoint servers, databases
and clusters across
the
corporate network can be managed centrally, with a single-sign in for
authorized administrators;
Multiple
administrator accounts:
uses multiple password protected administrator accounts to access
Clusterpoint Manager at different levels of DBA or sysadmins rights;
access rights to work with Clusterpoint Manager can be limited to view
and administer only particular storages, effectively partitioning
access to databases based on security credentials, need to know,
splitting development access from production etc.;
Database
virtualization support:
management of multiple named databases (storages) per single hardware
server, for different customer applications, with different sets of
users for each one, running them in parallel on the same hardware, yet
administered and monitored by data center personnel through
Clusterpoint Manager application;
Management of
cluster databases:
supports management of clustered configuration of a named data storages
(same name N database shards, stored as 1/Nth of the total database
content among N cluster nodes, together making a single logical cluster
storage), distributed among multiple hardware servers in a networked
cluster;
Management of
full database mirroring: supports
management of the same name database mirroring into multiple copies,
running on different hardware nodes in a clusters, and automatically
performing full database replication and updates by Clusterpoint
Server; enables to create additional database mirror copies, remove
them, synchronize them manually etc.
Remote
control of database services: status
control, startup and shutdown of named storage servers; each
Clusterpoint database storage is being served by the
dedicated Clusterpoint Server instance, running in RAM and
securely separated from all other server instances servicing other
storages;
User
administration: multiple
end-user API accounts
with different access rights to Clusterpoint server storages, in groups
of access rights, up to API command level for a particular storage and
user, for example, restricting user only for database status
monitoring, or search;
Centralized
log file analysis and error handling:
for each storage all transaction log files and error files
are
centrally accessible through Clusterpoint Manager interface, can be
searched, viewed and inspected;
Storage
Configuration management:
enables to manage configuration file options for each named
document storage within a user-friendly Web form describing meaning of
the configuration parameters;
Document
Policy management:
enables to manage Document Policy configuration file options
for
each named document
storage within a user-friendly Web form describing meaning of the
configuration parameters; enabling to assign relative relevancy weights
for parts of custom XML document; define Document ID tag for
identification and retrieval of stored XML documents; assigning
indexing defaults for specific customer defined XML tags etc.;
Cluster-wide
configuration changes: enables
one-click application of any Storage Configuration or Document Policy
changes for a cluster storages, distributed among large number of
cluster nodes; automates laborious and repetitive tasks when changing
configurations by DBA in massively clustered IT infrastructure
environment;
Command line
support: all
configuration files for storages (databases) are stored in easy to edit
XML format, under the same name directory as a named storage; customers
can use common command line tools to open and modify the Clusterpoint
Server configuration for any storage, without using Clusterpoint
Manager; still many folks prefer this way of system management;
Simplified
location of all files: Each
database (storage) configuration files, all stored database documents
and all indexes are separated in its own
disk directory of the same name as the named storage, making system
administration easy and understandable requiring just basic sysadmins
skills and knowledge of basic file system operations;
Built-in web interface module for running API commands :
for executing individual Clusterpoint API commands in any
storage
directly from Clusterpoint Manager Web interface, without programming;
useful for DBAs to quickly check data integrity, search functionality,
database status and many other things;
Statistics
Dashboard: for viewing and quick filtering of log files,
viewing totals of types of transactions etc.;
SNMP
management agent: for Clusterpoint Server status
checks using
common standards-based network
management systems.
Security and User Administration
User
authorization: with user name and password;
Security
partitioning: restrict user access for specific storages
within corporate
network;
Different
access rights: each user access can be limited in every
storage for only
specific Clusterpoint API commands;
Encryption: option
to encrypt traffic between client (application) and
server using SSL;
Transaction
Queue Protection: engine based filter for
blocking denial-of-service attacks; sustaining
heavy query workloads;
Audit log: all
management operations by DBAs performed using Clusterpoint Manager, are
logged;
Transaction
log: all search queries and indexing transaction results
and
errors are logged for each storage;
Trouble-shooting
support: use of unique identifiers and timestamps for
debugging and
tracking of transactions;
Automatic
rotation of log files: to prevent data loss
because of a too large log file size;
Chronological log
files: log files are organized by dates to ease backup,
debugging and administration tasks;
Database
integrity controls: built-in automatic data integrity
controls to prevent data
loss in case of an unexpected server shutdown.
Documentation
and Code Samples
Developers
Guide: an interactive WiKi-based resource;
Community
Forum: a searchable for problem solving tips,
technical support recommendations and answers on frequently asked
questions;
Sample client
code for Clusterpoint API: for C,
Java, PHP, .NET, Perl.
Cross-platform
Availability
Operating
systems: installation
package suitable for any Linux distribution, tested on most popular
Linux distributions: RedHat, SuSE, Slackware, Debian, Mandrake.
Customization:
optional: custom installation service for other Linux
distributions available;
Out-of-the-box
installation with OS: ISO image, i.e.,
Clusterpoint Server software installation package;
Free demo
download:
fully functional evaluation software for test-driving Clusterpoint
Server; a full-fledged cluster version software distributed
under
Clusterpoint Enterprise Evaluation License, 60-day free trial;
afterwards can be upgraded to permanent Enterprise license without
re-installation, over Internet, using supplied by us license activation
keys of the products, which can be done using free Clusterpoint Manager
application;
Installation
on customer hardware:
turn-key installation solution on a
customer
hardware - customer's choice of Linux distribution,
Clusterpoint
Server
software and our technical support, including remote installation
services, configuration and problem solving over Internet, please see
section Technical Support;
Licensing of
the source code of the Clusterpoint Server: the core
Clusterpoint database engine was developed
in C and C++ for portability across
different operating systems and to achieve maximum speeds even on
low-end
hardware; can be ported to the requested operating system and custom
usage requirements;
Cross-platform
database storage: Clusterpoint
Server database storage files (configuration, document storage, index
and log files, together making a complete named storage, all
located under the same name directory) are cross-platform compatible
between different operating systems: Linux, FreeBSD,
MacOS, or Windows, and does not require
database migration or re indexing. All Clusterpoint
XML-database
storage
files (data,
index, log and configuration files) can be simply copied onto
the
new operating system
platform server and and can be served by Clusterpoint Server software
compiled and run natively by that particular operating system product
(); this portability of database storage across platforms enable to
adjust hardware and system IT infrastructure to the customer most
productive and economic solution for Clusterpoint database management;
Uniform
cross-platform licensing terms: Clusterpoint
Server is being licensed per server only, the same license can be
applied to any hardware and any operating system; although we
have software products for different operating systems, it is not
necessary to license Clusterpoint Server for a specific operating
system; you can use different Clusterpoint Server software products for
several OSes, for example, if you run in parallel Linux and MacOS
servers and need both Clusterpoint Server products (compiled code for
both OSes), still you pay only license for total number of servers;
in this way you can easily change operating system without
paying
extra cost for that particular operating system database
license.
Hardware
Requirements
Processor
architecture: 64-bits; optionally Clusterpoint Server
database engine software can be compiled for 32-bit systems to run on
older hardware;
Minimum CPU
speed: 1GHz per server;
Recommended
entry level CPU speed: 2GHz, multi-core, per server;
Minimum RAM:
512MB RAM per server;
Recommended
entry level databases (up to 50GB) RAM: 2GB RAM
per server;
Recommended
for large (>50GB) datasets RAM: >
4GB RAM per server;
Recommended
disk subsystem: RAID 0/1 enabled SCSI or SATA hard disks
or SSD, min. 7200 rpm;
Recommended
disk subsystem for high-speed database I/O: SDD or
high-speed HDD
arrays;
Minimum
networking support: a single Ethernet 10/100Mbps
interface port;
Recommended
networking capacity: 2
x Ethernet 100/1000 Mbps ports (can be used to additionally partition
management and access security on different network segments, also
useful for call-in remote diagnostics);
Recommended
database configuration for very large (>500Gb) datasets:
distributed search database
running
on multiple servers in Clusterpoint Server
cluster configuration.
Then N cluster nodes
of inexpensive commodity server hardware equipment can each run 1/Nth
of the database and Clusterpoint Server software installed on all N
cluster nodes will provide consolidated use of all CPU power, RAM and
disks storage to a particular database;
Supports
virtualization:
Clusterpoint Server can run into any virtualized environment, and you
can easily manage all your Clusterpoint Servers and databases in a
cluster, setting up and operating complete Clusterpoint database
platform infrastructure under any virtual machines based cloud IT
infrastructure;
Uninterruptible
Power Supply:
recommended for all hardware servers; although Clusterpoint Server has
a built-in database integrity control and in most cases can
automatically recover from equipment failures; the Clusterpoint Server
core software is designed using many modern database performance
optimization methods, such as buffering of data and index updates into
RAM, pre-emptive read of data and index into RAM for speeding up
multi-page browsing of databases in web applications, and using other
advanced memory caching and disk buffering schemes; all database server
instances servicing their particular storages must be started and shut
down as servers, and having UPS helps to protect database integrity; we
are not different from any other database server system: you decide if
you need extra protection with UPS for your mission-critical database
server systems.