The data warehouse enables you to visualize and query data from external sources with PostHog's tools.
Visualizing source data with trends
After linking your source, you can use its data in a trend like you would event or person data.
To do this, create a new insight, click the series, go to the Data warehouse tables tab, hover over the table you want, and click Select.
The main difference from events is that you must identify 3 fields from the data warehouse data to visualize it:
- ID: A field that corresponds to the ID of the element.
- Distinct ID: A field that corresponds to an ID representing the user associated with the element. If none, can match ID.
- Timestamp: A timestamp field representing when the element was created.
It's infeasible for us to map these fields. We pre-fill with a best guess, but allow you to edit it.
Filters and breakdowns
When using data warehouse tables in insights, you can use properties from those tables filter and breakdown as you would with any other insight. For example, you could filter zendesk_tickets
for ones where the status
is open
or hubspot_companies
where lifecyclestage
is lead
.
Want to create insights with a combination of PostHog and external data? Check out our docs on joining tables and persons.
Querying sources with SQL
For more complicated queries, you can use SQL insights.
To start, either create a new SQL insight or go to the data warehouse tab. Here you can see the schemas all available external and PostHog tables as well as saved views
To see available sources, go to the data pipeline page. This page shows the external and PostHog tables as well as saved views you can visualize with trends or query through SQL insights.
Here you can:
- Write your SQL query using your table like
SELECT * FROM hubspot_contacts
- Click Update and run to see the results.
- Modify your query using SQL commands as needed to get the data you want like
SELECT email FROM hubspot_contacts WHERE city = 'Brisbane'
. - Save your query.
Querying multiple sources together
Much of the power of the data warehouse comes from combining multiple sources, like your Stripe or Hubspot data with your product analytics data. Two of the easiest ways of doing this are WHERE IN
and JOIN
SQL commands.
For example, to get a count of events for your Hubspot contacts you can filter events.distinct_id
by email FROM hubspot_contacts
like this:
SELECT COUNT() AS event_count, distinct_idFROM eventsWHERE distinct_id IN (SELECT email FROM hubspot_contacts)GROUP BY distinct_idORDER BY event_count DESC
You can also use a JOIN
such as INNER JOIN
or LEFT JOIN
to combine data. For example, to get a count of events for your Stripe customers you can INNER JOIN
on distinct_id
and email
like this:
SELECT events.distinct_id, COUNT() AS event_countFROM eventsINNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.emailGROUP BY events.distinct_idORDER BY event_count DESC