Blotout uses airflow for scheduling and monitoring workflows. Airflow is deployed within the blotout cloud which allows organization to have complete access to visualize, manage and monitor pipelines.
Airflow is available at the
/airflow endpoint of Blotout Web application post the deployment step is completed. So if organization name is
example and env is
prod then the Blotout web application will be hosted at
https://example-ui-prod.blotout.io and airflow at
Obtaining credentials for airflow¶
- Log in to the AWS console
- Go to the
Secrets Managerservice. Make sure you are in the same region as of your deployment choice.
- Following secrets will be available for you. Click on
- Click on
Retrieve secret valueto retrieve the password.
- Log in
airflowwith username as
adminand the above password.
Below are the various jobs by default configured in the system
|Job that trigger at the initial launch of infra for initial setup
|Triggers spark job to process/flatten incremental click stream data and stitch to prepare unified table
|ID stitching Job - stitches ID between Online and Offline data and attach with global_user_id
|Job triggers step by step different DBT models for session, unique_events, transformed/refined models for reporting
|Job triggers the Campaign DBT model for attribution reporting view
|Job triggers the Campaign DBT model for campaign reporting view
|Job setup the different reporting views (deprecated in 0.21.0)
|Job triggers the Retention DBT model for reporting view
|Job to compress small parquet files generated by Spark Job
|Dynamic Job (enables On Shopify ELT pipeline creation) - to auto sync predefined entities like funnel, segment etc.
|Job for clean up temp tables created in Data Lake
|Periodical Job releases the idle Database connection
|Monthly Job which trigger the Hardware Cost to configured email address
|Job to reconcile the activation pipeline stats and maintain the daily sync records per channel
|Automation job to schedule dashboards and send on email
|Automation job to auto sync newly added charts/dashboards etc.
|Automation job to auto sync all the tables available in data lake
Below are the variables that are present in airflow. TO check the variables click on
Admin and then
Variables. To know more, check Manage Airflow Variables
|Assumed start time for airflow cron jobs
|AWS region of deployment
|EC2 instance type for EMR
|0 * * * *
|Cron time for click stream data processing
|0 */4 * * *
|Cron time for ID Stitching Job
|Subnet ID in which infrastructure is running
|0 */3 * * *
|Cron time for job to delete idle db connections
|DBT Module Docker tag
|DBT Module Docker tag
|Reverse EL (Activation) Docker tag
|Superset Automation Docker tag
|*/30 * * * *
|Cron Time for Superset Dashboard automation
As the user adds new ELT pipeline, Airflow automatically picks that up and create the respective Airflow ELT pipeline for the same.
As the user adds new Activation channel like Klaviyo, Facebook Audience etc. for Audience sync, Airflow automatically picks that up and create the respective Airflow ELT pipeline for the same.