Mage AI Data Pipeline

Art Krisada
4 min readJul 12, 2024

--

A little experiment with MageAI while I try to find anything other than Apache Airflow.

I’s small and good for data pipeline. I try it with docker. Might add note on K8S installation later.

For this note, I will run local MageAI, connect to Gitlab Repo and use GitSync for Pipeline.

You must read this first.

To connect to Gitlab, First, Add Mage Oauth application. Go to Gitlab.com. Click your avatar > Edit profile

Click Menu Application on the left then click add new application button.

Fill in the form. I test in localhost so Redirect URI is as follow.

Mage will request read_user, write_repository, and api scopes from GitLab in order to perform the necessary git actions.

Copy Application ID and Secret. You wull need them later.

Below is how to config from gitlab document.

Mage AI use PostgreSQL to keep pipeline/schedule data. I use my external PostgreSQL DB. You can put it in env when use docker run.

docker run --name mageai -d -it -p 6789:6789 -v $(pwd):/home/src \
-e MAGE_DATABASE_CONNECTION_URL=postgresql+psycopg2://YOUR_POSTGRES_USER:YOUR_POSTGRES_PASSWORD@YOUR_POSTGRES_HOST:5432/YOUR_MAGE_AI_DATABASE \
-e USER_CODE_PATH=/home/src/your_project \
-e GITLAB_CLIENT_ID=YOUR_GITLAB_APPLICATION_ID \
-e GITLAB_CLIENT_SECRET=YOUR_GITLAB_APPLICATION_SECRET \
-e GITLAB_HOST=https://gitlab.com \
mageai/mageai \
/app/run_app.sh mage start your_project

Replace YOUR_POSTGRES_USER, YOUR_POSTGRES_PASSWORD, YOUR_POSTGRES_HOSST, YOUR_MAGE_AI_DATABASE with your Postgres Credential.

Replace your_project as your intended Project Name.

Replace GITLAB_CLIENT_ID with Gitlab Application ID you created before.

Replace GITLAB_CLIENT_SECRET with Gitlab Application Secret you created before.

Open your browser and go to http://localhost:6789

Click menu Version Control. You will see Gitlab

Click Authenticate and save Initialize Git directory.

Add Remote > Name your remote and paste Your Gitlab Project URL.

Next, Fetch from remote.

Mage AI still use master branch so it might be a bit confuse. Next, go to COMMIT tab and commit new files generate by Mage AI.

Push then Create pull Request to Merge to Main. Go to Merge in you Gitlab.

These step use for develop on MageAI Local.

###############################################################

For Mage AI on server to run your pipeline, you need to do Git Sync to fetch the latest code and run. Go to Settings > Git settings

Set your Repository and preference. You also need git deploy token.

You can create deploy token in your Gitlab Repo.

Last, A little note on how to import your python code to use in pipeline.

Last Again, A note on add on python package. just add in requirements.txt then restart docker.

--

--

Art Krisada
Art Krisada

Written by Art Krisada

Never stop learning, because life never stop teaching.

No responses yet