InfiniDB: Column Oriented Analytic Database
Calpont InfiniDB Enterprise is a column oriented database specifically designed for data warehousing, business intelligence, big data analytics and any other read-intensive application deployment. InfiniDB can be deployed on a single, stand-alone server with the ability to service several terabytes of data, or it can be deployed against multiple servers in a distributed, MPP-enabled, highly available fashion with unlimited scalability.
How Do Column Oriented Databases Work?
All the legacy relational databases currently being offered today were and are primarily designed to handle online transactional processing (OLTP) workloads. A transaction (e.g. an online order for a book through Amazon or another Web-based book dealer) typically maps to one or more rows in a relational database, and all traditional RDBMS designs are based on a per row paradigm. For transactional-based systems, this architecture is well-suited to handle the input of incoming data.
However, for applications that are very read intensive and selective in the information being requested, the OLTP database design isn’t a model that typically holds up well. Whereas transactions are row-based, most database queries are column-based. Inserting and deleting transactional data are well served by a row-based system, but selective queries that are only interested in a few columns of a table are handled much better by a column database architecture. On average, a row-based system does 5-10x the physical I/O that a column-based database does to retrieve the same information. Taking into account that physical I/O is typically the slowest part of a query, and that an analytical query typically touches significantly more rows of data that a typical transactional database operation, the performance gap between row-oriented architectures and column-oriented architecture oftentimes widens as the database grows.
To get around their selective query inefficiencies, row-based RDBMS’s utilize indexing, horizontal partitioning, materialized views, summary tables, and parallel processing, all of which can provide benefits for intensive queries, but each comes with their own set of drawbacks as well. For example, while indexing can certainly help queries complete faster in some cases, they also require more storage, impede insert/update/delete and bulk load operations (because the indexes must be maintained as well as the underlying table), and can actually degrade performance when they become heavily fragmented. Moreover, in business intelligence/analytic environments, the ad-hoc nature of such scenarios makes it nearly impossible to predict which columns will need indexing, so tables end up either being over-indexed (which causes load and maintenance issues) or not properly indexed and so many queries end up running much slower than desired.
Many IT professionals have already recognized the superiority of column oriented databases in read-intensive environments and have started using them. At the end of 2009, a TDWI research study found that fifty-one percent of its respondents already use a column-based database now, with that number projected to increase to 75% in three years.
Why such momentum for columnar technologies?
"IT professionals are finding out that, for business intelligence database activities, they simply work as advertised."
- Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.
Those not familiar with a column oriented database might wonder exactly what they are and what actual benefits they deliver over a legacy RDBMS. It’s important to note that, on the surface, a columnar database appears exactly like a traditional relational database: the logical concepts of tables and rows are the same, SQL commands are used to interact with the system, and most other RDBMS paradigms (e.g. security, backup/recovery, etc.) remain unchanged."
Learn more about column oriented databases: