from ibis.interactive import *
= ibis.connect("postgres://postgres:postgres@localhost/postgres") pg_con
- 1
- Import Ibis for maximum productivity in interactive analysis.
Phillip Cloud
March 6, 2024
We’ve blogged about Snowflake IO before, in the context of getting local files into Snowflake as fast as possible.
In this post, we’ll show how to insert query results from another system into Snowflake, using Ibis.
We’ll connect to a postgres database running locally in a container. You should be able to swap in your own connection details as needed.
from ibis.interactive import *
pg_con = ibis.connect("postgres://postgres:postgres@localhost/postgres")
We’ll use a test dataset that contains some baseball batting statistics.
Ibis provides that example data, so we can dump that into postgres.
Let’s build an Ibis expression based on the batting
table in our postgres database.
┏━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓ ┃ player_id ┃ year_id ┃ stint ┃ team_id ┃ lg_id ┃ g ┃ ab ┃ r ┃ h ┃ x2b ┃ x3b ┃ hr ┃ rbi ┃ sb ┃ cs ┃ bb ┃ so ┃ ibb ┃ hbp ┃ sh ┃ sf ┃ gidp ┃ ┡━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩ │ string │ int64 │ int64 │ string │ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ float64 │ float64 │ float64 │ int64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ ├───────────┼─────────┼───────┼─────────┼────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ abercda01 │ 1871 │ 1 │ TRO │ NA │ 1 │ 4 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0.0 │ 0.0 │ 0.0 │ 0 │ 0.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ addybo01 │ 1871 │ 1 │ RC1 │ NA │ 25 │ 118 │ 30 │ 32 │ 6 │ 0 │ 0 │ 13.0 │ 8.0 │ 1.0 │ 4 │ 0.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ allisar01 │ 1871 │ 1 │ CL1 │ NA │ 29 │ 137 │ 28 │ 40 │ 4 │ 5 │ 0 │ 19.0 │ 3.0 │ 1.0 │ 2 │ 5.0 │ nan │ nan │ nan │ nan │ 1.0 │ │ allisdo01 │ 1871 │ 1 │ WS3 │ NA │ 27 │ 133 │ 28 │ 44 │ 10 │ 2 │ 2 │ 27.0 │ 1.0 │ 1.0 │ 0 │ 2.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ ansonca01 │ 1871 │ 1 │ RC1 │ NA │ 25 │ 120 │ 29 │ 39 │ 11 │ 3 │ 0 │ 16.0 │ 6.0 │ 2.0 │ 2 │ 1.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ armstbo01 │ 1871 │ 1 │ FW1 │ NA │ 12 │ 49 │ 9 │ 11 │ 2 │ 1 │ 0 │ 5.0 │ 0.0 │ 1.0 │ 0 │ 1.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ barkeal01 │ 1871 │ 1 │ RC1 │ NA │ 1 │ 4 │ 0 │ 1 │ 0 │ 0 │ 0 │ 2.0 │ 0.0 │ 0.0 │ 1 │ 0.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ barnero01 │ 1871 │ 1 │ BS1 │ NA │ 31 │ 157 │ 66 │ 63 │ 10 │ 9 │ 0 │ 34.0 │ 11.0 │ 6.0 │ 13 │ 1.0 │ nan │ nan │ nan │ nan │ 1.0 │ │ barrebi01 │ 1871 │ 1 │ FW1 │ NA │ 1 │ 5 │ 1 │ 1 │ 1 │ 0 │ 0 │ 1.0 │ 0.0 │ 0.0 │ 0 │ 0.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ barrofr01 │ 1871 │ 1 │ BS1 │ NA │ 18 │ 86 │ 13 │ 13 │ 2 │ 1 │ 0 │ 11.0 │ 1.0 │ 0.0 │ 0 │ 0.0 │ nan │ nan │ nan │ nan │ 0.0 │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────────┴─────────┴───────┴─────────┴────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
We can compute the average RBI per year per team.
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓ ┃ year_id ┃ team_id ┃ avg_rbi ┃ ┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩ │ int64 │ string │ float64 │ ├─────────┼─────────┼───────────┤ │ 1891 │ PIT │ 22.782609 │ │ 1895 │ BSN │ 34.363636 │ │ 1940 │ SLA │ 22.343750 │ │ 1981 │ HOU │ 9.972973 │ │ 1913 │ CLE │ 13.512821 │ │ 1971 │ MON │ 17.181818 │ │ 2008 │ PIT │ 15.000000 │ │ 1895 │ WAS │ 23.096774 │ │ 2011 │ KCA │ 16.785714 │ │ 2007 │ MIL │ 19.350000 │ │ … │ … │ … │ └─────────┴─────────┴───────────┘
We can also rename columns to be more consistent with typical Snowflake usage.
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓ ┃ YEAR_ID ┃ TEAM_ID ┃ AVG_RBI ┃ ┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩ │ int64 │ string │ float64 │ ├─────────┼─────────┼───────────┤ │ 1891 │ PIT │ 22.782609 │ │ 1895 │ BSN │ 34.363636 │ │ 1940 │ SLA │ 22.343750 │ │ 1981 │ HOU │ 9.972973 │ │ 1913 │ CLE │ 13.512821 │ │ 1971 │ MON │ 17.181818 │ │ 2008 │ PIT │ 15.000000 │ │ 1895 │ WAS │ 23.096774 │ │ 2011 │ KCA │ 16.785714 │ │ 2007 │ MIL │ 19.350000 │ │ … │ … │ … │ └─────────┴─────────┴───────────┘
Let’s show how many rows we have in the result.
Because all Ibis backends implement the to_pyarrow()
method, we can get data out of another system and into Snowflake with a few lines of code.
First we’ll create a table in Snowflake to hold the data.
Ibis helps here by providing an API to access the schema from the postgres-based expression, and automatically translates postgres types into Snowflake types.
We’ll show that the table is empty to sanity check ourselves.
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓ ┃ YEAR_ID ┃ TEAM_ID ┃ AVG_RBI ┃ ┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩ │ int64 │ string │ float64 │ └─────────┴─────────┴─────────┘
Insert the expression’s result table into Snowflake.
To sanity check what we’ve done let’s peek at the table.
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓ ┃ YEAR_ID ┃ TEAM_ID ┃ AVG_RBI ┃ ┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩ │ int64 │ string │ float64 │ ├─────────┼─────────┼───────────┤ │ 1891 │ PIT │ 22.782609 │ │ 1895 │ BSN │ 34.363636 │ │ 1940 │ SLA │ 22.343750 │ │ 1981 │ HOU │ 9.972973 │ │ 1913 │ CLE │ 13.512821 │ │ 1971 │ MON │ 17.181818 │ │ 2008 │ PIT │ 15.000000 │ │ 1895 │ WAS │ 23.096774 │ │ 2011 │ KCA │ 16.785714 │ │ 2007 │ MIL │ 19.350000 │ │ … │ … │ … │ └─────────┴─────────┴───────────┘
We’ll count them too, to be extra sure.
In this post we show how easy it is to move data from one backend into Snowflake using Ibis.
Please try it out and get in touch on Zulip or GitHub, we’d love to hear from you!