Menu

How to create an Event Catalog?

I'm recently worked in a project where event driven architecture was used to distribute different events to multiple consumers. During the project we spend some time to think how to document events easily and how to create centralized repository for event schemas. I share some thoughts about event catalog in this blog post. 

What is event driven architecture?

undefined

In event driven architecture components communicate with each other through events. Events can be triggered when significant change has happend in application state. Event driven architecture enables nearly real-time processing, decoupling of components and component specific scaling.

About terminology

Event Catalog

Event Catalog is a top-level term for the Schema Registry (repository), Schema Registry API and documentation site functionality.

Event

Event is created by producer / publisher and it contains information about a change in state (e.g. FeedbackCreated). Interested consumers can subscribe / listen these events.

Event Schema

Event schema is a specification of the structure of the event. Schema determines all fields in event.

Schema Registry (repository)

Schema registry is a centralized repository to organize and share schemas between publishers and consumers. Typically schema registries support grouping schemas to logical groups and takes care of versioning. 

Schema Registry API

API enables to share event schemas for publishers and consumers programmatically.

Documentation Site

Documentation site is an essential component in event-driven architecture. It helps developers to understand and visualize dependencies of services which are publishing or consuming events. Over time when event driven architecture evolves it might be tricky to understand events and dependencies without proper documentation.

Typically Documentation site shows event schema, event samples and dependency diagrams between event publisher and consumer services.

undefined

Open-source tool for Documentation Site

Open-source project called EventCatalog is a comprehensive and free event driven architecture documentation tool. This tool has comprehensive list of built-in features which enables to show domain boundaries, event schemas, event samples and dependencies. Markdown syntax and Mermaid diagrams are also supported which enables that you can enrich the event documentation as much as you want. 

EventCatalog is an advanced event documentation tool and it doesn't provide API capabilities to share event schemas for publishers and consumers. You need to create the API layer on your own.

You can find comprehensive instructions how to install and configure EventCatalog from here. In Azure you can host EventCatalog static web site e.g. in Static Web Site, Blob Storage or in App Service.

Some screenshots from EventCatalog

Events listing

undefined

Event Schema viewer

undefined

Event sample viewer

event-catalog-sample-small.webp

Dependency visualizer

event-catalog-diagram-small.webp

Thoughts about Event Catalog implementation in our project

Requirements

  • One centralized repository for event schemas is required
  • API or Client SDK is required to enable event publishers and consumers to fetch event schemas programmatically
  • Documentation site which can presents event schemas, samples and diagrams of event publishers and consumers

Event documenting tool

EventCatalog documentation tool provided so much great features and it matched quite well for our needs so it was clear that we would use it. Our plan is to host EventCatalog site in Azure Blob Storage (Static Web Site). Capability to share event schemas via API was the only thing which was missing so we needed to consider other ways to fill this requirement.

Next I'll present some concepts what we considered for implementation during the discovery work.

First idea

undefined

Event Hubs is a fundamental part of event driven architecture in our project. During the project we noticed that schema registry (repository) is actually already part of the Event Hubs. This was great because we just needed to solve how to update schemas automatically to EventCatalog (documentation site) when schema was changed in Event Hubs schema registry (master data source).

Note that schema registry is available only in Basic or higher pricing tier of Event Hubs.

EventCatalog uses below folder structure so schemas files should be fairly easy to update to this structure.

├── events
│   ├── FeedbackCreated
│   │     └──versioned
│   │     │  └──0.0.1
│   │     │     └──Examples
│   │     │     │  └──example.json
│   │     │     └──index.md
│   │     │     └──schema.json
│   │     └──index.md
│   │     └──schema.json

Event Hub has great Client SDK / API for schema registry which solves the requirement of fetching schemas programmatically. You can find samples how to use it from here.

First thought was that perhaps schema registry in Event Hub supports Event Grid to distribute events e.g. when schema was updated. Azure Function could then subscribe those events and fetch the new schema from schema registry via Client SDK and update it to EventCatalog site's specific folder structure. Unfortunately CaptureFileCreated was the only supported event type.

So this wasn't a feasible solution.

Second iteration

undefined

In this second iteration main goal was still use schema registry of Event Hub as a master source for schemas but schemas were updated to EventCatalog (documentation site) periodically with Azure Function. Azure Function was responsible for fetching the schema data from Schema Registry via Client SDK and then update schemas to Blob Storage where EventCatalog (documentation site) is hosted.

Pros

  • Schema Registry is built-in feature provided by Event Hubs and custom development is not required
  • No need to implement separate schema registry API because Event Hub's Client SDK / API enables access to schemas programmatically
  • Schema reader / updater potentially could create event samples automatically while updating schemas

Cons

  • Schema Registry client in Azure.Data.SchemaRegistry Nuget package currently doesn't support fetching all schemas at once. You need to know the name of the schema to retrieve the actual schema. This is not a complete show stopper but requires some extra work.
  • Schema Registry supports AVRO schema format but JSON support is still in preview.
  • If you don't use Event Hub in your system architecture, you need to separately provision it to get access to Schema Registry and it generates some small extra costs per month.
  • Schema reader / updater Azure Function requires logic to determine which publisher / consumer service is using the event. This is determined in event specific index.md file.

This is a possible solution but let's iterate this still a bit more. 

Third iteration

undefined

Combining Event Hub's schema registry and open source documentation site EventCatalog was a bit too complex as stated in the previous iteration so we decided to iterate this a bit more.

In this approach Event Hub's schema registry was removed and EventCatalog in Blob Storage will be the master data source for event schema data. This approach enables that we can use Blob Storage Client SDK / API to fetch schema data from Blob Storage. To make fetching schemas for publishers and consumers as easy as possible, custom library should be developed. We had different kind of systems as a publisher and consumer so this would require extra work.

This approach enables that you can maintain event schemas and event samples in source control of EventCatalog and CI/CD pipeline will deploy everything to Blob Storage.

Sample event specific index.md file

---
name: FeedbackProcessed
version: 1.0.0
summary: |
  FeedbackProcessed event contains raw event data and enriched sentiment data provided by Azure Cognitive Service.

producers:
    - Feedback Processor
consumers:
    - Feedback Subscriber
---

Pros 

  • You can maintain event schemas and event samples in source control of EventCatalog
  • Simpler architecture. No need to replicate event schemas from another place because everything is in one place (Blob Storage)
  • You don't need Event Hub if it's not used in your architecture

Cons

  • Requires more development effort
  • Different type of publisher and consumer systems require own Schema Reader libraries which increase maintenance

Summary

It was pretty difficult to automate completely Event Hub's Schema Registry to work with EventCatalog documention site. From these options I would choose a solution presented in iteration three where schema registry (repository) is in Blob Storage because solution is simpler. If you don't need advanced event documention then Schema Registry of Event Hub is good option for you.

Comments