
Dimensional Modelling for Advanced Data Analytics and Cloud Solutions
Creating a Semantic Layer is far along the Analytic Environment Maturity scale. Few have succeeded and are achieving benefits from understanding and leveraging their data assets. This architecture understands this challenge and offers a mechanism to utilize 3rd party providers, such as professional organizations, to provide a large part of the content. In AI terms,…
In theory, a virtual data environment sounds perfect: a single interface to all your data. But how does it actually work? What happens when you drag “Sales” and “Inventory” into a query and click “run”? This post lifts the hood on the machine. We’ll trace the complete path of a query—from the moment an analyst…
What is Semantics? “Semantics, within the realm of data analytics, refers to the meaning and interpretation of data elements and their relationships. Instead of viewing data as mere numbers or strings, semantics provides context—explaining what the data represents, how it should be understood, and how different pieces of data relate to each other logically. By
In most organizations, the most valuable data analysts—those hired to uncover mission-critical insights—spend up to 80% of their time not on analysis, but on the frustrating, manual labor of finding, cleaning, and reconciling data from a dozen different systems. This massive waste of talent and time is a direct result of a fragmented data landscape.…
In my earlier blog I discussed MPP style cloud databases. These databases imitate a physical MPP system on conventional hardware. An MPP requires very specific thoughts on how to arrange the data, so queries run efficiently. But almost all other databases claim they can handle huge amounts of data without ANY preparation. What gives?
Imagine you are faced with multiple sources of similar data and there is no industry standard to identify what the data refers to. Many professions have standard coding systems that are shared by all involved. Examples include diagnosis codes, ISO codes, state codes, ZIP codes, and so on. In this case, each data feed unambiguously…
Natural keys can be complex. It may involve multiple columns making its use prone to error. A column may be omitted in the join causing run-away queries. It is not efficient as it requires significantly more work to compare two strings over two binary integers. A surrogate is simple and easy to understand: The key…
The optimal data schema for parallelization is a Star Schema. Normalized data models are very poor for such systems because all tables are based on a unique primary key. Vendors that encouraged such modeling (Teradata) included an extensive array of bizarre indexing strategies to overcome the issue. So, the machine required a lot of handholding…