Mage AI Data Pipeline
A little experiment with MageAI while I try to find anything other than Apache Airflow.
I’s small and good for data pipeline. I try it with docker. Might add note on K8S installation later.
For this note, I will run local MageAI, connect to Gitlab Repo and use GitSync for Pipeline.
You must read this first.
To connect to Gitlab, First, Add Mage Oauth application. Go to Gitlab.com. Click your avatar > Edit profile
Click Menu Application on the left then click add new application button.
Fill in the form. I test in localhost so Redirect URI is as follow.
Copy Application ID and Secret. You wull need them later.
Below is how to config from gitlab document.
Mage AI use PostgreSQL to keep pipeline/schedule data. I use my external PostgreSQL DB. You can put it in env when use docker run.
docker run --name mageai -d -it -p 6789:6789 -v $(pwd):/home/src \
-e MAGE_DATABASE_CONNECTION_URL=postgresql+psycopg2://YOUR_POSTGRES_USER:YOUR_POSTGRES_PASSWORD@YOUR_POSTGRES_HOST:5432/YOUR_MAGE_AI_DATABASE \
-e USER_CODE_PATH=/home/src/your_project \
-e GITLAB_CLIENT_ID=YOUR_GITLAB_APPLICATION_ID \
-e GITLAB_CLIENT_SECRET=YOUR_GITLAB_APPLICATION_SECRET \
-e GITLAB_HOST=https://gitlab.com \
mageai/mageai \
/app/run_app.sh mage start your_project
Replace YOUR_POSTGRES_USER, YOUR_POSTGRES_PASSWORD, YOUR_POSTGRES_HOSST, YOUR_MAGE_AI_DATABASE with your Postgres Credential.
Replace your_project as your intended Project Name.
Replace GITLAB_CLIENT_ID with Gitlab Application ID you created before.
Replace GITLAB_CLIENT_SECRET with Gitlab Application Secret you created before.
Open your browser and go to http://localhost:6789
Click menu Version Control. You will see Gitlab
Click Authenticate and save Initialize Git directory.
Add Remote > Name your remote and paste Your Gitlab Project URL.
Next, Fetch from remote.
Mage AI still use master branch so it might be a bit confuse. Next, go to COMMIT tab and commit new files generate by Mage AI.
Push then Create pull Request to Merge to Main. Go to Merge in you Gitlab.
These step use for develop on MageAI Local.
###############################################################
For Mage AI on server to run your pipeline, you need to do Git Sync to fetch the latest code and run. Go to Settings > Git settings
Set your Repository and preference. You also need git deploy token.
You can create deploy token in your Gitlab Repo.
Last, A little note on how to import your python code to use in pipeline.
Last Again, A note on add on python package. just add in requirements.txt then restart docker.