Photo by Carlos Muza on Unsplash
A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL
Singer ETL
A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL
In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql
and loading it into target-jsonl
. This seamless process ensures efficient data transfer while maintaining integrity.
Step 1: Enable Python Virtual Environment (venv)
Even without administrative access, you can set up a Python virtual environment. Here's how:
Create a new virtual environment:
python3 -m venv <file_name>
Navigate to the Script Directory within the created file and activate the virtual environment:
.\activate.bat
Now, your virtual environment is active and ready for use.
Step 2: Install tap-mysql
and target-jsonl
Utilize pip to install the necessary packages:
pip install tap-mysql target-jsonl
Step 3: Prepare Configuration Files
tap-mysql
requires two input files: config.json
and properties.json
.
config.json
:
{
"host": "127.0.0.1",
"port": "3306",
"user": "root",
"password": "root"
}
properties.json
:
{
"streams": [
{
"tap_stream_id": "sakila-actor_info",
"table_name": "actor_info",
"schema": {
"properties": {
"actor_id": {
"inclusion": "available",
"minimum": 0,
"maximum": 65535,
"type": [
"null",
"integer"
]
},
"first_name": {
"inclusion": "available",
"maxLength": 45,
"type": [
"null",
"string"
]
},
"last_name": {
"inclusion": "available",
"maxLength": 45,
"type": [
"null",
"string"
]
},
"film_info": {
"inclusion": "available",
"maxLength": 65535,
"type": [
"null",
"string"
]
}
},
"type": "object"
},
"stream": "actor_info",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"selected": true,
"replication-method": "FULL_TABLE",
"selected-by-default": false,
"database-name": "sakila",
"is-view": true
}
},
{
"breadcrumb": [
"properties",
"actor_id"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "smallint unsigned"
}
},
{
"breadcrumb": [
"properties",
"first_name"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(45)"
}
},
{
"breadcrumb": [
"properties",
"last_name"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(45)"
}
},
{
"breadcrumb": [
"properties",
"film_info"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "text"
}
}
]
}
]
}
Step 4: Generate properties.json
Execute the following command in discover mode to generate catalog.json
:
tap-mysql --config config.json --discover > catalog.json
After running the command to generate catalog.json
, locate the JSON content of the desired table and copy it into a new file named selected_table.json
.
In selected_table.json
, add the following lines within the curly braces to ensure the table is selected:
{
"streams": [ // Paste the content here
}
This action ensures that only the selected table is included for extraction.
By following these steps, you can efficiently specify the desired table for extraction from MySQL using tap-mysql
.
Incorporate the following two lines into the 'metadata' section of the selected_table.json file to select the table.
"selected": true
"replication-method": "FULL_TABLE"
Step 5: Run tap-mysql
Execute the following command:
tap-mysql --config config.json --catalog properties.json
Congratulations! You've successfully extracted data from MySQL using tap-mysql
.
Step 6: Send Data to jsonl
Target
Run the following command to send the data to jsonl
target:
tap-mysql --config config.json --catalog properties.json | target-jsonl
A file with the same name as the table will be created.
Step 7: Convert Output to a DataFrame
Here's an example of how to convert the output to a DataFrame using Python:
import pandas as pd
import json
data = []
with open("<file_name>.jsonl", "r") as f:
for line in f:
data.append(json.loads(line))
df = pd.DataFrame(data)
print(df.columns)
This step allows for further analysis and manipulation of the extracted data.
By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.