2017 Hackies Entry: DIY marketing data warehouse recipe
March 13, 2017 by Scott Brinker
This article is a guest post by Alex Sirota of NewPath Consulting. It was entered into The Hackies essay contest for the upcoming MarTech conference. Like it? You can register your vote in the contest by sharing it on social media, especially LinkedIn, Facebook, and Twitter.
Let’s use Amazon Web Services Relational Database Service (AWS RDS) to create a cloud based relational database system!
AWS RDS is a cloud platform that makes it easier to set up, operate, and scale a relational database in the cloud. AWS RDS provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks.
AWS RDS currently supports these databases:
- Oracle on Amazon RDS
- MySQL on Amazon RDS
- Microsoft SQL Server on Amazon RDS
- PostgreSQL on Amazon RDS
- Aurora on Amazon RDS
- MariaDB on Amazon RDS
Below is a recipe for getting going with your own data warehouse to start building powerful business dashboards with tools like Looker.
- Amazon Web Services RDS — free for the first 12 months (certain restrictions apply, see the RDS pricing page).
- Setup a trial account at Fivetran.com.
- Install pgAdmin or SQL Workbench/J (with JDBC drivers for your operating system).
- Need some working knowledge of SQL to grant permissions on your data warehouse to Fivetran.
- Setup an Amazon RDS instance using PostgreSQL. Amazon Redshift and Snowflake are higher-end enterprise tools that are also potential data warehouses, but we haven’t tried them out yet. You can use your amazon.com account to login to AWS and setup PostgreSQL on AWS RDS.
- Once AWS RDS is setup you will be able to connect to the instance from your machine only (the default access control for a new AWS RDS).
- Modify access control groups for AWS RDS to enable access for Fivetran to access your database.
- Create necessary user for Fivetran using your administrative account and grant privileges for your database to the Fivetran account (SQL). Make sure you commit your changes!
- Setup data sources in Fivetran, authenticate and start the data pipeline (Google Analytics and Stripe are interesting ones to start with).
- You will need to setup a schema (a collection of tables).
- You will need to specify which target table will be populated.
- Use your SQL Workbench or pgAdmin to verify tables are being created in your data warehouse.
- The data pipeline will continue to populate data as new data is created in the data sources. Any changes in the schema will be reflected in your data warehouse but no data will ever be deleted. Changes will be marked in the target tables.
Time to bake:
Probably 3-5 hours depending on how quickly you can click the UIs!
Alternate prep techniques:
The functionality of tools like Zapier should not be confused with tools like Fivetran. Zapier is great at integrating two or more cloud apps that do not have built in integrations.
So, for example, using Zapier you can have a form filled out in Formstack, trigger an event in a Slack channel. Formstack does not support integration with Slack, but Zapier bridges this gap so that data can trigger an event in a remote cloud app. Zapier is sort of a “crazy glue” to connect disparate cloud apps that don’t have a native integration.
Have fun integrating data and building data pipelines!
What did you think of this article as an entry in The Hackies essay contest for the upcoming MarTech conference? If you liked it, you can register your vote in the contest by sharing it on LinkedIn, Facebook, and Twitter.
Have a marketing/technology/management “hack” that you want to share with the world? Consider entering The Hackies yourself — we’d love to learn from your experience and insight!