Blog – Data is Dimensional

Data is Dimensional

Dimensional Modelling for Advanced Data Analytics and Cloud Solutions

Nick Galemmo

May 7, 2025

Other Cloud Parallel Databases

In my earlier blog I discussed MPP style cloud databases. These databases imitate a physical MPP system on conventional hardware. An MPP requires very specific thoughts on how to arrange the data, so queries run efficiently. But almost all other databases claim they can handle huge amounts of data without ANY preparation. What gives?
Read more
Nick Galemmo

April 18, 2025

The Self-Correcting Dimension

Imagine you are faced with multiple sources of similar data and there is no industry standard to identify what the data refers to. Many professions have standard coding systems that are shared by all involved. Examples include diagnosis codes, ISO codes, state codes, ZIP codes, and so on. In this case, each data feed unambiguously…
Read more
Nick Galemmo

March 3, 2025

Eliminate Data Latency in Analytics

Natural keys can be complex. It may involve multiple columns making its use prone to error. A column may be omitted in the join causing run-away queries. It is not efficient as it requires significantly more work to compare two strings over two binary integers. A surrogate is simple and easy to understand: The key…
Read more
Nick Galemmo

February 15, 2025

Understanding Cloud MPP Databases

The optimal data schema for parallelization is a Star Schema. Normalized data models are very poor for such systems because all tables are based on a unique primary key. Vendors that encouraged such modeling (Teradata) included an extensive array of bizarre indexing strategies to overcome the issue. So, the machine required a lot of handholding…
Read more

Other Cloud Parallel Databases

The Self-Correcting Dimension

Eliminate Data Latency in Analytics

Understanding Cloud MPP Databases