Skip to content

Data Storage Standards

S3 layer

Below are the 6 Buckets created by Blotout in the AWS Account from the purpose of proper data management

Bucket Name Description
b-[org_name]-[env]-landing Raw Layer - All the landing/raw data will be first pushed into this bucket
b-[org_name]-[env]-stg Staging Layer - All the ELT data will be onboarded to this layer
b-[org_name]-[env]-processed Processed Layer - All the processed data or reporting models will be pushed into this bucket
b-[org_name]-[env]-emr EMR bucket to maintain Bootstrap files and EMR Logs
b-[org_name]-[env]-athena-logs Bucket to store the temporary output generated by Athena
b-[org_name]-[env]-outbound Bucket to store the data moving out of the lake
Note

[org_name] refers to the organization name and [env] refers to the environment type (like prod, sandbox etc.)

Schema Standards

Source Type Athena Schema Name
ELT Sources [source_name]_[env]
Click Stream & DBT Generated & Reporting Models [org-name]_[env]

Example

Let's take Organization Name is _foo_ and env is _prod_ and this Org has enabled 2 ELT pipelines namely Shopify and Postgres then below schemas are created and respective data will be onboarded under the same.

  • DBT & click stream tables will be maintained under - foo_prod
  • Postgres tables will be maintained under - postgres_prod
  • Shopify tables will be maintained under - shopify_prod

Table Description

Table Name Table Type Description
view_core_events Online Table contains flattened near real time events generated from website on user behaviour
view_users Online + Offline Contains a single uniform view for a user. System will automatically unify CRM profiles coming from multiple channels in system.
unified_events Online + Offline Single source of truth to unify all of the time series events together with Stitched ID's and this data will be used for Segmentation
view_id_graph Online + Offline Maintains the ID Graph between your Cookies, Map ID's and associate with Global ID