top of page

CTO Update: Introducing our semantic layer

  • Mar 19
  • 6 min read

Henry Harrison, Kognia CTO and co-founder, shares an update on what we've been building at Kognia, and upcoming plans for the company.



There's a persistent gap in football analytics: we have more data than ever, but getting it into actionable workflows is still high-friction. Analysts end up wrangling complex SQL schemas or maintaining fragile ETL pipelines just to answer basic tactical questions.


We've felt this internally too. Every tactic we detect during our analysis could inspire dozens of data visualisations, but we haven't had the chance to build them all into our web application — at least not yet. So while we keep working in that direction, we also want to open the door: give our customers the right tools and see what they come up with.


On the customer side, the story is similar. The most data-mature clubs already have Tableau or Power BI environments, internal data warehouses, Python notebooks. They don't want another platform to log into, they want our data where they already work (as put so eloquently by 1907 Como Group CTO Mo Dabbah earlier this week on LinkedIn). But until now, that's meant building and maintaining custom ETL pipelines against our API, which requires data engineering resources most clubs either don't have or don't want to commit.


That's why we're building what we call a Tactical Data Layer.


Kognia's tactical data: a quick primer

For those less familiar with us, Kognia combines tracking and event data to create tactical data. Where traditional data can tell you that a pass happened, Kognia captures the context around it: team shape, off-ball movements, defensive actions, pressing triggers. Each tactical concept also has a set of measurements (i.e. qualifiers) underneath it. Take runs in behind: you can break each striker’s unique movement patterns relative to game situation. You can evaluate how the defensive line responds following your own football intuitions. That kind of tactical depth hasn't been available as structured data before.


We've built our reputation in the academy space, where clubs rely on us to process and analyse hundreds of games per season. Now we're expanding into first-team recruitment. That means a much larger data footprint who have a different set of needs.


The problem with "just use the API"

As we've started working with more data-mature clubs, a pattern has emerged. The more advanced recruitment and analytics departments don't want access to our platform, they want the data itself. They already have internal data warehouses, Tableau or Power BI environments and Python workflows. What they need is for Kognia's tactical data to show up where they already work.


Our existing API goes some way toward this, but it introduces friction. You need to understand our schema, handle joins, manage authentication and build your own aggregation logic. And before any of that, you need to build and maintain ETL pipelines. That's a real data engineering commitment. Most clubs either don't have data engineers, or don't want to dedicate them to maintaining pipelines against a third-party schema that might change.


The thing is, the questions these clubs want to answer are often quite intuitive:


  • Which centre halves show the best recovery rates when tracking forward runs into the box?

  • Which central midfielders most frequently show for the ball in situations to take pressure off their teammates? 

  • Which wide forward can most effectively exploit structural gaps in opposition defences?


Those queries should be straightforward. We set out to make them so.


Introducing our semantic layer

We've built a semantic layer on top of our data infrastructure. Instead of exposing raw database tables and asking users to figure out the relationships, we expose data organised around the concepts that analysts actually think in. Here's how the pieces fit together:



Raw tracking and event data flows into our BigQuery warehouse both directly and via Kognia's tactical analysis engine, which enriches it with the spatial and tactical context that makes our data unique. From there, dbt handles the transformation layer, cleaning, joining, and modeling the data, with automated tests to catch issues before they reach users. dbt reads from and writes back to BigQuery, giving us version-controlled transformations and clear lineage showing how every metric is calculated from source data.


On top of that, we've adopted Cube as our semantic layer technology. Cube sits between the warehouse and the end user. It's where we define the business logic: what "runs behind the defensive line per 90" actually means, how possession is adjusted, which aggregations are valid for which metrics. Once defined in Cube, those definitions are consistent everywhere, whether accessed via a native BI connector, a REST API, or a GraphQL API. Cube also handles caching and pre-aggregation, so queries come back fast without hitting the underlying database every time.


What this means is that a club analyst can open Tableau, connect to Kognia, and start building dashboards against a clean, well-documented dataset. No ETL. No SQL. The metrics they see are the same ones we use internally, which are consistently defined, governed, and kept up to date as new matches are processed.


For clubs with their own data warehouses, we also support direct data ingestion, so they can pull Kognia data into their own infrastructure and blend it with their proprietary datasets.

And for clubs that want to get going without building anything themselves, we'll be offering a professional-services model. We build tailored dashboards and reports on their behalf, so they can get value from the data straight away while deciding how much they want to invest in their own tooling.


Why Cube?

The semantic layer is a relatively new concept in the data stack, and there are several ways to approach it. We evaluated a few alternatives before landing on Cube.


The simplest option was to skip the semantic layer entirely and expose well-structured dbt models directly to external consumers. That would've been faster to ship, but it wouldn't have solved the deeper problem: business logic scattered across dbt models and application code, with no stable contract for external consumers. Any change to the underlying data model would ripple out to every integration.


We also considered using dbt's own semantic layer (MetricFlow). Since we already use dbt for transformations, it was a natural fit for internal BI. But for a customer-facing data product, we needed robust multi-tenancy and multiple access methods. Cube offered all of those out of the box.


The trade-off we accepted is an additional layer of YAML-based modeling on top of dbt. But the upside is significant: Cube gives us a stable API contract that decouples our internal data infrastructure from what consumers see. That means we can refactor (or even replace) the underlying warehouse without breaking a single dashboard or integration. For a company scaling from academy data to numerous professional leagues, that flexibility is worth the added complexity.


Where we are today

We've completed our internal proof of concept and the feedback has been good. We're now moving toward first external deployments. The semantic layer serves clean, governed data to BI tools with solid performance. The remaining work is mostly expanding the available views based on what users need, validating the security setup, and running the architecture through its paces.


Shortly, we'll be inviting select customers as beta users with BI connections. Later, we'll add full support for REST and GraphQL connections as well.


A glimpse at what's next: Sports World Models

The data warehouse is about making our existing data more accessible. We're also investing in something different: the ability to generate new data.


Our AI team has been developing what we call Sports World Models: generative AI models that, instead of generating words like an LLM, generate player trajectories. Think of it as a "what if" engine for football: What if the winger had stayed wide? What if we'd pressed higher? How would the play have unfolded?


Our latest model, JointDiff, was accepted at ICLR 2026. The research showed that by jointly modeling continuous player trajectories and discrete events (like ball possession changes) within a single diffusion framework, you can generate realistic, tactically coherent simulations of play. The most exciting part is the ability to add custom text constraints that the simulation will obey.


We're now building the engineering to make these models practical: model-serving APIs and interactive front-end experiences. It's early, but the directions are clear: coaches running "what if" scenarios on real game situations, analysts stress-testing defensive setups against thousands of simulated attacks, and better data quality where tracking data is incomplete or noisy.


There's a longer-term connection between these two workstreams. The semantic layer makes our data discoverable and queryable by machines, not just humans. Combine that with a world model that accepts text constraints, and you have the ingredients for agentic AI experiences. An analyst describes a tactical scenario in natural language, an agent queries the semantic layer for the relevant historical data, and the world model simulates how it plays out. That's where we're headed.


We wrote more about JointDiff and our research direction in a previous post. More on the engineering side in future posts.


Growing the team

Finally, I’m excited to add that we're scaling up to support all of this. In particular, as I'm no longer able to be as hands-on with the team as I used to be, we'll soon be looking for our first dedicated engineering leadership hire (title TBD, but the role would involve managing individual contributors and reporting to the CTO). If building data and AI products for professional football sounds like your kind of challenge, I'd love to hear from you even before the listing is live. Follow us to keep up to date on this and any other future developments.


Henry Harrison is the CTO of Kognia Sports Intelligence. Connect with Henry on LinkedIn.








 
 
 

Comments


bottom of page