Data Storage Standards¶
S3 layer¶
Below are the 6 Buckets created by Blotout in the AWS Account from the purpose of proper data management
| Bucket Name | Description |
|---|---|
| b-[org_name]-[env]-landing | Raw Layer - All the landing/raw data will be first pushed into this bucket |
| b-[org_name]-[env]-stg | Staging Layer - All the ELT data will be onboarded to this layer |
| b-[org_name]-[env]-processed | Processed Layer - All the processed data or reporting models will be pushed into this bucket |
| b-[org_name]-[env]-emr | EMR bucket to maintain Bootstrap files and EMR Logs |
| b-[org_name]-[env]-athena-logs | Bucket to store the temporary output generated by Athena |
| b-[org_name]-[env]-outbound | Bucket to store the data moving out of the lake |
Note
[org_name] refers to the organization name and [env] refers to the environment type (like prod, sandbox etc.)
Schema Standards¶
| Source Type | Athena Schema Name |
|---|---|
| ELT Sources | [source_name]_[env] |
| Click Stream & DBT Generated & Reporting Models | [org-name]_[env] |
Example¶
Let's take Organization Name is _foo_ and env is _prod_ and this Org has enabled 2 ELT pipelines namely Shopify and Postgres then below schemas are created and respective data will be onboarded under the same.
- DBT & click stream tables will be maintained under - foo_prod
- Postgres tables will be maintained under - postgres_prod
- Shopify tables will be maintained under - shopify_prod
Table Description¶
| Table Name | Table Type | Description |
|---|---|---|
| view_core_events | Online | Table contains flattened near real time events generated from website on user behaviour |
| view_users | Online + Offline | Contains a single uniform view for a user. System will automatically unify CRM profiles coming from multiple channels in system. |
| unified_events | Online + Offline | Single source of truth to unify all of the time series events together with Stitched ID's and this data will be used for Segmentation |
| view_id_graph | Online + Offline | Maintains the ID Graph between your Cookies, Map ID's and associate with Global ID |