Big Data and Analytics -- the two belong together like two peas in a pod. So much of the focus on Big Data centers around the data management layer (Hadoop and the Data Warehouse) that it's easy to forget that the best data management strategy adds little value without meaningful analytics.
Some customers harness the power of InfiniDB to power Business Intelligence and Reporting applications. Other customers get jazzed about connecting InfiniDB with Hadoop. But, we also receive interesting inquiries on advanced analytic applications. How can statistical functions be used alongside InfiniDB? How do Data Scientists utilize the power of InfiniDB in their analytic and machine learning applications?
Data Scientists rejoice! We are pleased to announce our latest whitepaper which details how to use InfiniDB alongside R, a popular open-source statistical analysis platform.
The new InfiniDB Quick Start for R guide showcases several examples of pairing InfiniDB with R. In these examples, InfiniDB becomes the data access layer, enabling Data Scientists to quickly work with large data volumes (e.g. multiple Terabytes) in real-time.
Statistical packages like R, SAS and SPSS primarily work in-memory, utilizing the available memory of the single server they've been installed on. Adding InfiniDB on the data access layer allows a user to work with Terabytes of data, quickly producing slices of data and pulling them into the statistical package, allowing it to run analytics within its available memory.
Although statisticians are accustomed to sampling a population for their analyses, using an InfiniDB layer allows the Data Scientist to easily run analyses across a company's entire Data Warehouse.
To learn more about connecting InfiniDB to R, please see the InfiniDB Quick Start Guide.Look for the latest version of InfiniDB v4.0, to be released late 2013, which will add support for analytical functions such as ranking, support for standard deviations, support for percentiles and other features. These functions can be run on an entire dataset or can be "windowed" to a subset of the data, allowing for fast calculations on complex queries.