A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

Photo by Carlos Muza on Unsplash

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

Singer ETL

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql and loading it into target-jsonl. This seamless process ensures efficient data transfer while maintaining integrity.

Step 1: Enable Python Virtual Environment (venv)

Even without administrative access, you can set up a Python virtual environment. Here's how:

  • Create a new virtual environment:

      python3 -m venv <file_name>
    
  • Navigate to the Script Directory within the created file and activate the virtual environment:

      .\activate.bat
    

Now, your virtual environment is active and ready for use.

Step 2: Install tap-mysql and target-jsonl

Utilize pip to install the necessary packages:

pip install tap-mysql target-jsonl

Step 3: Prepare Configuration Files

tap-mysql requires two input files: config.json and properties.json.

config.json:

{
    "host": "127.0.0.1",
    "port": "3306",
    "user": "root",
    "password": "root"
}

properties.json:

{
    "streams": [
        {
            "tap_stream_id": "sakila-actor_info",
            "table_name": "actor_info",
            "schema": {
                "properties": {
                    "actor_id": {
                        "inclusion": "available",
                        "minimum": 0,
                        "maximum": 65535,
                        "type": [
                            "null",
                            "integer"
                        ]
                    },
                    "first_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "last_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "film_info": {
                        "inclusion": "available",
                        "maxLength": 65535,
                        "type": [
                            "null",
                            "string"
                        ]
                    }
                },
                "type": "object"
            },
            "stream": "actor_info",
            "metadata": [
                {
                    "breadcrumb": [],
                    "metadata": {
                        "selected": true,
                        "replication-method": "FULL_TABLE",
                        "selected-by-default": false,
                        "database-name": "sakila",
                        "is-view": true
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "actor_id"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "smallint unsigned"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "first_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "last_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "film_info"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "text"
                    }
                }
            ]
        }
    ]
}

Step 4: Generate properties.json

Execute the following command in discover mode to generate catalog.json:

tap-mysql --config config.json --discover > catalog.json

After running the command to generate catalog.json, locate the JSON content of the desired table and copy it into a new file named selected_table.json.

In selected_table.json, add the following lines within the curly braces to ensure the table is selected:

{
    "streams": [ // Paste the content here
}

This action ensures that only the selected table is included for extraction.

By following these steps, you can efficiently specify the desired table for extraction from MySQL using tap-mysql.

Incorporate the following two lines into the 'metadata' section of the selected_table.json file to select the table.

"selected": true
"replication-method": "FULL_TABLE"

Step 5: Run tap-mysql

Execute the following command:

tap-mysql --config config.json --catalog properties.json

Congratulations! You've successfully extracted data from MySQL using tap-mysql.

Step 6: Send Data to jsonl Target

Run the following command to send the data to jsonl target:

tap-mysql --config config.json --catalog properties.json | target-jsonl

A file with the same name as the table will be created.

Step 7: Convert Output to a DataFrame

Here's an example of how to convert the output to a DataFrame using Python:

import pandas as pd
import json

data = []
with open("<file_name>.jsonl", "r") as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
print(df.columns)

This step allows for further analysis and manipulation of the extracted data.

By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.