Setting Up Your Local Machine for dbt Core: A Comprehensive Guide
dbt™ is an awesome SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.
With dbt, anyone on a data team can safely contribute to production-grade data pipelines seamlessly.
Getting Started: Setting Up for Success
This article is primarily for Windows operating systems, but will likely prove as a useful reference for those using various other OS software.
I've also added helpful links throughout this article, but make sure you are holding down the CTRL key before clicking, so that the link opens up in a new tab. At the time of this writing, it does not look like authors are able to add the HTML "_blank" target attribute to links.
Both Python and Git for Windows will need installed before you get started.
If you do not have Python installed on your local workstation, you can download and install in from the python.org website downloads page here.
Check that Python is installed and/or the release version with the following command in your command prompt or terminal:
If you don't already have Git SCM installed, now would be a good time to download and setup. Check out the Git online Reference Manual for The official and comprehensive man pages that are included in the Git package itself.
While you're at it, a good source-code editor like VS Code would probably be good to have installed as well.
Streamlining Your Workflow: Mastering Package Installation for Efficiency
First, you'll need to install dbt Core on your Windows system. You can do this using pip, the Python package manager.
pip is a package-management system written in Python, and it is the preferred installer program. Starting with Python 3.4, pip is included by default with the Python binary installers.
Install dbt Core by running the following in your command prompt or terminal:
Install only the above package(s) you need with their respective dbt-<adapter>, OR install them all.
If prompted to upgrade pip, execute the following command:
I recommend verifying and keeping track of package versions, so that you are aware of what you're working with. Keeping track of software versions can prove to be extremely helpful when troubleshooting and working with various packages.
To upgrade dbt-core to the latest version via the terminal, you can use pip, the Python package manager.
Check the current version of dbt Core by running:
All adapters build on top of dbt-core. Some also depend on other adapters: for example, dbt-redshift builds on top of dbt-postgres. In that case, you would see those adapters included by your specific installation, too.
To upgrade dbt Core to the latest version, use pip:
After the installation completes, you can verify that dbt-core has been upgraded to the latest version by running the dbt --version command.
Note that you may also have to upgrade the dbt-<adapter> packages that you have installed, as you might run into dependency conflicts like in the example below with the SQL Server package.
Navigate to your user directory (e.g. ~/User/YOUR-USERNAME/), and create a ~/.dbt/ folder. This is where you will create and store your profiles.yml file.
Connection profiles
When developing locally, dbt connects to your Data Warehouse (DW) using a profile, which is a YAML file with all of the connection details to your DW.
To use dbt Core, you'll need a profiles.yml file that contains the connection details for your data platform.
In your profiles.yml file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use. Most organizations only have one profile.
Note that dbt uses YAML in a few different places. If you're new to YAML, it would be worth learning how arrays, dictionaries, and strings are represented.
This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control, but profiles.yml can be safely checked in when using environment variables to load sensitive credentials.
The YAML file below shows the configuration options in the profiles.yml file:
You may want to have your profiles.yml file stored in a different directory than ~/.dbt/ – for example, if you are using environment variables to load your credentials, you might choose to include this file in the root directory of your dbt project.
Note that the file always needs to be called profiles.yml, regardless of which directory it is in.
Unlocking Project Configurations: Optimizing Your Workflow with dbt
Every dbt project needs a dbt_project.yml file — this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate your project.
By default, dbt will look for dbt_project.yml in your current working directory and its parents, but you can set a different directory using the --project-dir flag or the DBT_PROJECT_DIR environment variable.
The following example is a list of all available configurations in the dbt_project.yml file:
When you run dbt Core from the command line, it reads your dbt_project.yml file to find the profile name, and then looks for a profile with the same name in your profiles.yml file. This profile contains all the information dbt needs to connect to your data platform.
Later, you will see how you can utilize the dbt init command to easily generate both profiles.yml and dbt_project.yml files.
Creating The Repository
Create a repository in your github.com account. Alternatively, you can use the GitHub Desktop App if you prefer a GUI, want to simplify your development workflow, and/or find yourself fighting with Git on a regular basis.
Navigate to your Documents folder [or your preferred working directory for cloning repositories], and create a GitHub folder.
You can also create your working directory with the following command:
Clone the repository you created earlier to your GitHub working directory.
To clone your repository, locally, using the command line using HTTPS, copy the URL displayed in GitHub Web with the HTTPS button selected.
In your CLI, type git clone, and then paste the URL you copied earlier. Your command will look similar to the following example:
Now that you have cloned your GitHub repository to your local machine, you are ready to start structuring your project.
Crafting Successful dbt Projects: Best Practices and Strategies
Analytics engineering, at its core, is about helping groups of human beings collaborate on better decisions at scale.
Humans generally have limited bandwidth for making decisions.
Humans also, as a cooperative social species, rely on systems and patterns to optimize collaboration with others.
This combination of traits means that for collaborative projects it's crucial to establish consistent and comprehensible norms such that a team’s limited bandwidth for decision making can be spent on unique and difficult problems, not deciding where folders should go or how to name files.
Building a great dbt project is an inherently collaborative endeavor, bringing together domain knowledge from every department to map the goals and narratives of the entire company.
As such, it is especially important to establish a deep and broad set of patterns to ensure as many people as possible are empowered to leverage their particular expertise in a positive way, and to ensure that the project remains approachable and maintainable as your organization scales.
Steve Jobs and the Analogous Brilliance of dbt
Famously, Steve Jobs wore the same outfit everyday to reduce decision fatigue. You can think of this overview similarly, as a black turtleneck and New Balance sneakers for your company’s dbt project.
A dbt project’s power outfit, or more accurately its structure, is composed not of material fabric: but of files, folders, naming conventions, and programming patterns.
How you label things, group them, split them up, or bring them together — the system you use to organize the data transformations encoded in your dbt project — this is your project’s structure.
Structuring dbt projects
Structuring the files, folders, and models for our three primary layers in the models directory, which build on each other:
Staging — creating our atoms, our initial modular building blocks, from source data
Intermediate — stacking layers of logic with clear and specific purposes to prepare our staging models to join into the entities we want
Marts — bringing together our modular pieces into a wide, rich vision of the entities our organization cares about
Below is a complete file tree of the jaffle_shop sample project from dbt Docs, and a nice example of the default monolithic dbt structure:
Getting Started: Creating Your New dbt Project
The quickest and easiest way to get started with a new dbt project is with the init command. The dbt init command will begin setting up your profile and prompt for inputs.
Then, it will create a new folder with your project name and sample files, and create a connection profile on your local machine.
To create a new dbt project, execute the dbt init command in your terminal:
If you've already cloned or downloaded an existing dbt project, dbt init can still help you set up your connection profile so that you can start working quickly.
The command will prompt you for connection information, as above, and add a profile (using the profile name from the project) to your local profiles.yml, or create the file if it doesn't already exist.
Harnessing Git: Best Practices for Committing Changes
When finished, it's important to commit your changes, so that the repository contains the latest code.
To commit your changes, link to the GitHub repository you created for your dbt project by running the following commands in Terminal:
Return to your GitHub repository to verify your new files have been added.
Now that you set up your dbt project, you can get to the fun part — building models!
Thanks for reading. Your feedback and suggestions are always welcome. Stay tuned for more engaging content ahead!