Recent Posts

Nick Galemmo

September 17, 2025

Creating a Virtual Data Analytics Environment Pt 4

Creating a Semantic Layer is far along the Analytic Environment Maturity scale. Few have succeeded and are achieving benefits from understanding and leveraging their data assets. This architecture understands this challenge and offers a mechanism to utilize 3rd party providers, such as professional organizations, to provide a large part of the content. In AI terms,…
Read more
Nick Galemmo

August 21, 2025

Creating a Virtual Data Analytics Environment Pt 3

In theory, a virtual data environment sounds perfect: a single interface to all your data. But how does it actually work? What happens when you drag “Sales” and “Inventory” into a query and click “run”? This post lifts the hood on the machine. We’ll trace the complete path of a query—from the moment an analyst…
Read more
Nick Galemmo

August 1, 2025

Creating a Virtual Data Analytics Environment Pt 2

What is Semantics? “Semantics, within the realm of data analytics, refers to the meaning and interpretation of data elements and their relationships. Instead of viewing data as mere numbers or strings, semantics provides context—explaining what the data represents, how it should be understood, and how different pieces of data relate to each other logically. By
Read more
Nick Galemmo

June 30, 2025

Creating a Virtual Analytics Environment

In most organizations, the most valuable data analysts—those hired to uncover mission-critical insights—spend up to 80% of their time not on analysis, but on the frustrating, manual labor of finding, cleaning, and reconciling data from a dozen different systems. This massive waste of talent and time is a direct result of a fragmented data landscape.…
Read more
Nick Galemmo

May 7, 2025

Other Cloud Parallel Databases

In my earlier blog I discussed MPP style cloud databases. These databases imitate a physical MPP system on conventional hardware. An MPP requires very specific thoughts on how to arrange the data, so queries run efficiently. But almost all other databases claim they can handle huge amounts of data without ANY preparation. What gives?
Read more
Nick Galemmo

April 18, 2025

The Self-Correcting Dimension

Imagine you are faced with multiple sources of similar data and there is no industry standard to identify what the data refers to. Many professions have standard coding systems that are shared by all involved. Examples include diagnosis codes, ISO codes, state codes, ZIP codes, and so on. In this case, each data feed unambiguously…
Read more
Nick Galemmo

March 3, 2025

Eliminate Data Latency in Analytics

Natural keys can be complex. It may involve multiple columns making its use prone to error. A column may be omitted in the join causing run-away queries. It is not efficient as it requires significantly more work to compare two strings over two binary integers. A surrogate is simple and easy to understand: The key…
Read more
Nick Galemmo

February 15, 2025

Understanding Cloud MPP Databases

The optimal data schema for parallelization is a Star Schema. Normalized data models are very poor for such systems because all tables are based on a unique primary key. Vendors that encouraged such modeling (Teradata) included an extensive array of bizarre indexing strategies to overcome the issue. So, the machine required a lot of handholding…
Read more

Welcome!

This site focuses on the use of Dimensional Modeling to create a modern, flexible, high-performance Analytic Repository.

The blog presents observations about the industry and the impact of Cloud services on achieving cost efficiency and performance improvement.

The Discussion Forum invites professionals involved in various aspects of databases for analytic systems to discuss their usage, challenges, and solutions.

Post Archive