Data Storage Standards¶

S3 layer¶

Below are the 6 Buckets created by Blotout in the AWS Account from the purpose of proper data management

Bucket Name	Description
b-[org_name]-[env]-landing	Raw Layer - All the landing/raw data will be first pushed into this bucket
b-[org_name]-[env]-stg	Staging Layer - All the ELT data will be onboarded to this layer
b-[org_name]-[env]-processed	Processed Layer - All the processed data or reporting models will be pushed into this bucket
b-[org_name]-[env]-emr	EMR bucket to maintain Bootstrap files and EMR Logs
b-[org_name]-[env]-athena-logs	Bucket to store the temporary output generated by Athena
b-[org_name]-[env]-outbound	Bucket to store the data moving out of the lake

Note

[org_name] refers to the organization name and [env] refers to the environment type (like prod, sandbox etc.)

Schema Standards¶

Source Type	Athena Schema Name
ELT Sources	[source_name]_[env]
Click Stream & DBT Generated & Reporting Models	[org-name]_[env]

Example¶

Let's take Organization Name is _foo_ and env is _prod_ and this Org has enabled 2 ELT pipelines namely Shopify and Postgres then below schemas are created and respective data will be onboarded under the same.

DBT & click stream tables will be maintained under - foo_prod
Postgres tables will be maintained under - postgres_prod
Shopify tables will be maintained under - shopify_prod

Table Description¶

Table Name	Table Type	Description
view_core_events	Online	Table contains flattened near real time events generated from website on user behaviour
view_users	Online + Offline	Contains a single uniform view for a user. System will automatically unify CRM profiles coming from multiple channels in system.
unified_events	Online + Offline	Single source of truth to unify all of the time series events together with Stitched ID's and this data will be used for Segmentation
view_id_graph	Online + Offline	Maintains the ID Graph between your Cookies, Map ID's and associate with Global ID