Thursday, August 4, 2016

Build an SSAS ROLAP Cube using EXASOL

For this post, we are building a real-time ROLAP cube using EXASOL.  For us, the concept of ROLAP, and the goal of a real time multi-dimensional data warehouse provided an ideal design. Use an analytics &  columnar based relational database for both the structured data warehouse and as the back end to our ROLAP cube.

As technology advances, real-time, or near real-time, analysis is becoming increasingly important. And while Hadoop, Big Data, and Data Science have captured most of the attention - cubes, or more specifically, multi-dimensional cubes are still a great tool. Upgrade the cube to real-time using ROLAP and it just gets better. On their website, EXASOL indicates support for use as the database backend to an Analysis Services (SSAS) Real-time ROLAP cube. We decided to take them up on it.

Overall, the process was straight forward.  First, implement a traditional star schema data warehouse in EXASOL. Load the data warehouse with SSIS, plus setup some test incremental data adds. Next, use SQL Server Data Tools to design and deploy a ROLAP cube. And finally, use EXCEL to connect with and test out our cube.

What we are looking for in this process:


  • Ease of data population, updates and data maintenance- using SSIS
  • Design and configuration of SSAS ROLAP cube using EXASOL
  • Usability - the Analyst experience


Data population, updates and data maintenance

Our primary focus is the ability to use an ETL tool to easily add supplemental records to the fact table. To simulate a near real-time data warehouse environment. And once added to the database, are the new records visible in our ROLAP cube? 

EXASOL supports both the legacy ODBC protocol as well as an ADO.NET connection. With both of these protocols, we were able to easily create and run SSIS ETL packages from our source data into the EXASOL data warehouse. For this exercise, we used the SQL Server Data Tools 2016 for Visual Studio 2015, without any problems. The version of the EXASOL ADO.NET Data Provider was for version 5.0.15.

As expected, using SSIS data loading preformed well, giving us the confidence that as a data warehouse, EXASOL could be easily maintained from a data management perspective. See - EXASOL - Using SSIS to load our Data Warehouse.

 Design and configuration of ROLAP cubes


Here we had our first speed bump.  The current version of SSDT 2016 for Visual Studio 2015 had multiple issues and would not properly connect to EXASOL  to create a new SSAS cube - so we pulled up an older version (1).  Using SQL Server Data Tools 2014 for Visual Studio 2013 - for an analysis services multi-dimensional project, we were able to easily setup a new Data Source connection to our EXASOL data warehouse using the ADO.NET data provider. Once connected, we could easily create a new Data Source View.

With our data source view in place and adjusted to our satisfaction, we could start cube design. We were pleased that the overall process flowed seamlessly. With no special adjustments required to support EXASOL. Since ours was a ROLAP cube, rather than a more common MOLAP, partitions required more attention.

(1) I have been informed by Mathias Golombek, CTO, that they are working on an updated ADO.NET driver to resolve these issues.

Real-time ROLAP, with Options

In Partition Storage Settings, we chose the standard Real-time ROLAP setting. Our only adjustment beyond this was to set up the Scheduled Polling settings in the Notifications section. Here, the polling interval was set to one (1) minute, with the polling query set to do a count against our fact table. One minute is good for near real time.  If needed, you can shorten the polling interval granularity to a matter of seconds. With our interval of one minute, our scheduled polling worked as expected. After running an incremental fact table update, the latest records were visible in the ROLAP cube after the interval wait time. For more detail on the partition settings used see: Partition Settings for our SSAS ROLAP cube using EXASOL.

Usability

Every analyst I know uses Excel. And as a free form tool, it's great. It also connects to SSAS cubes, so it makes a good tool to test a user's experience. Sure Tableau might be better, but at a very high cost.

First off, the overall analyst experience was good and really no different than if the tool was connected to an SSAS MOLAP cube. For this test we are working with EXASOL's community edition VM, which limits our configuration to a single node.  Our single node VM configuration has four (4) virtual 3.07GHz Xeon processors and 15 GB of memory running on a dedicated spindle. Our fact table has slightly more than 11 million records, plus five dimension tables.  While 11 million records is on the smaller side for a data warehouse, it's not an unreasonable size for medium sized companies and divisions.  I know quite a few companies that would love to have over a million sales transactions a year.

Warming the Indexes

Looking over the user manual, there are no discussions on setting up indexes. And that is by design. EXASOL manages all indexes and optimizations internally. As users query the database, the optimizer determines what indexes and statistics it needs. And we found this out with our single node VM. As we simulated an analyst querying the database via Excel, the response improved rapidly. Where a first query might take several seconds, the follow on queries dropped to milliseconds. Essentially, user demand drove the optimizer.

But I need my hands on the Optimizer control

I understand. DBA's need to have the ability to make adjustments. But since most of you work with SQL Server, you already benefit from some internal optimization features. That's why there are fewer controls compared to Oracle. It's not that they don't exist. It's just that behind the scenes, a few very bright database guru's worked out algorithms that respond in real time. That's progress. A 1960's Jaguar E-type needs a full time mechanic. It is a great museum piece, but your daily car is probably more modern.  With both cars and databases, less is often more. 

Summary

With advancements in columnar relational databases, ROLAP should be back on the list of tools that support day to day operational business intelligence, analytics and reporting.



3 comments:

elJay said...

Hi,

unfortunately the solution is not as performant as expected in our environment. SSAS will always select all measure columns from the exasol database what is in fact suboptimal for columnstore databases. Also it seems that sometimes somehow the filter clauses will not be set on the relational query which leeds to an enormously loss of speed.

Also I would like to hear if you use your dimensions in rolap or molap mode. On rolap we face some unexpected errors while drilling through the hierarchies.

The third thing I have problems with is that building the cube with the unknown member feature will lead to finding the measures empty in some conditions.

Maybe you have a sample configuration for me to sort this things out?

Best,
Lutz

elJay said...

Seems that can get rid of some of the problems buliding named queries on the fact table spliting the measures into several groups. Get you updateded.....

Andrew Peterson said...

Lutz, thanks for updates. At this time, we have only done some exploratory work. We used MOLAP dimensions for the first test. Based on what we have seen so far, we envision using ROLAP as a targeted data mart. Since SSAS will hold minimal data, that allows us to design multiple SSAS ROLAP cubes, each with a precise business need.