Skip to content

Airflow

Blotout uses airflow for scheduling and monitoring workflows. Airflow is deployed within the blotout cloud which allows organization to have complete access to visualize, manage and monitor pipelines.

Airflow pipelines

Logging in

Airflow is available at the /airflow endpoint of Blotout Web application post the deployment step is completed. So if organization name is example and env is prod then the Blotout web application will be hosted at https://example-ui-prod.blotout.io and airflow at https://example-ui-prod.blotout.io/airflow.

Airflow login

Obtaining credentials for airflow

  1. Log in to the AWS console
  2. Go to the Secrets Manager service. Make sure you are in the same region as of your deployment choice.
  3. Following secrets will be available for you. Click on airflow_password. Secrets Manager
  4. Click on Retrieve secret value to retrieve the password. Retrieve secret value
  5. Log in airflow with username as admin and the above password.

Airflow Jobs

Below are the various jobs by default configured in the system

DAG Name Category Description
one_time_setup Analytics Job that trigger at the initial launch of infra for initial setup
events_incremental Analytics Triggers spark job to process/flatten incremental click stream data and stitch to prepare unified table
id_stitching_incremental ID Graph ID stitching Job - stitches ID between Online and Offline data and attach with global_user_id
events_unified_view Analytics Job triggers step by step different DBT models for session, unique_events, transformed/refined models for reporting
attribution Analytics Job triggers the Campaign DBT model for attribution reporting view
campaign Analytics Job triggers the Campaign DBT model for campaign reporting view
derived_views Analytics Job setup the different reporting views (deprecated in 0.21.0)
retention Analytics Job triggers the Retention DBT model for reporting view
compression Misc Job to compress small parquet files generated by Spark Job
entity_creation Misc Dynamic Job (enables On Shopify ELT pipeline creation) - to auto sync predefined entities like funnel, segment etc.
cleanup_temp_tables Misc Job for clean up temp tables created in Data Lake
delete_idle_db_connections Misc Periodical Job releases the idle Database connection
billing Misc Monthly Job which trigger the Hardware Cost to configured email address
activation_stats Activation Job to reconcile the activation pipeline stats and maintain the daily sync records per channel
scan_report_runs_and_send_email Superset Automation job to schedule dashboards and send on email
sync_superset_dashboard Superset Automation job to auto sync newly added charts/dashboards etc.
sync_superset_tables Superset Automation job to auto sync all the tables available in data lake

Airflow variables

Below are the variables that are present in airflow. TO check the variables click on Admin and then Variables. To know more, check Manage Airflow Variables

Name Value (example) Description
AIRBYTE_URL https://ORGNAME-ui-ENV.blotout.io Airbyte URL
AIRFLOW_DAG_FAILED_EMAIL alert@blotout.io
AIRFLOW_START_DATE 1977-10-01 00:00:00 Assumed start time for airflow cron jobs
AWS_REGION us-west-2 AWS region of deployment
COMPUTATION_WINDOW 90
EMR_EC2_INSTANCE_TYPE m4.large EC2 instance type for EMR
EVENTS_INCREMENTAL_SCHEDULE_INTERVAL 0 * * * * Cron time for click stream data processing
ID_STITCHING_INCREMENTAL_SCHEDULE_INTERVAL 0 */4 * * * Cron time for ID Stitching Job
PRIVATE_SUBNET subnet-048ee3a00944bc2e0 Subnet ID in which infrastructure is running
SCHEDULE_INTERVAL_DELETE_IDLE_CONNECTIONS 0 */3 * * * Cron time for job to delete idle db connections
SUPERSET_BASE_URL https://ORGNAME-ui-ENV.blotout.io Superset URL
SUPERSET_PASSWORD Superset password
SUPERSET_USERNAME superset username
SUPERSET_USER_EMAIL superset email
TAG_DBT_ANALYTICS 0.20.0 DBT Module Docker tag
TAG_DBT_CODE_GENERATOR 0.20.0 DBT Module Docker tag
TAG_DBT_REVERSE_EL 0.20.0 Reverse EL (Activation) Docker tag
TAG_SUPERSET_AUTOMATION 0.20.0 Superset Automation Docker tag
USER_REPORTING_SCHEDULE_INTERVAL */30 * * * * Cron Time for Superset Dashboard automation

ELT Pipeline

As the user adds new ELT pipeline, Airflow automatically picks that up and create the respective Airflow ELT pipeline for the same.

Activation Pipeline

As the user adds new Activation channel like Klaviyo, Facebook Audience etc. for Audience sync, Airflow automatically picks that up and create the respective Airflow ELT pipeline for the same.