"Dockerized Web Scraping: Using Python and MySQL to Extract and Store Data"
In the ever-evolving world of data-driven applications, web scraping is a vital tool for extracting and analyzing information from websites. Integrating Docker, Python, and MySQL provides an efficient, scalable, and portable setup for creating and deploying web scraper applications. This blog will guide you through building a web scraper application using these technologies.
Why Use Docker, Python, and MySQL?
Docker: Ensures your application runs in isolated environments, making it easy to manage dependencies and scale.
Python: Offers powerful libraries like BeautifulSoup and Requests for web scraping.
MySQL: A robust database system to store and manage scraped data.
By combining these tools, you can build a portable and scalable application that performs seamlessly across various environments.
Project Overview
We will:
Set up the MySQL database container and initialize the database.
Create a Python web scraper to extract data from a website.
Attach the Python application to the MySQL database.
Build a Docker image for the Python application.
Use the image to scrape data and store it in the MySQL database.
Step 1: Setting Up the MySQL Database Container
The first step is to set up the MySQL container. We'll use the command to create a MySQL container and configure the database. We’ll also initialize the database with a table to store the scraped data.
Run the following command to start the MySQL container:
The command runs a Docker container with the following settings:
: Run the container in detached mode, interactively, and allocate a terminal.
: Assign the name "mydb" to the container.
: Set the root password for MySQL to "redhat".
: Create a database named "scraper_db" in MySQL.
: Map the container's port 3306 (MySQL default) to the host machine's port 3306.
: Use the latest MySQL image from Docker Hub.
Next, connect to the MySQL container to create the necessary table:
Once inside the MySQL shell, run the following SQL commands to set up the table:
This will create a table named to store the scraped data (quote text and author).
Step 2: Creating the Python Web Scraper
Now that the database is ready, let's create the Python application that will scrape data from the website.
Create a directory for your project:
Install necessary Python libraries:
create a requirement txt file and write all the libraries you want to download for this application . After that we can just run this command it will download all the mentioned libraries of the file.
Python packages: requests, BeautifulSoup, mysql-connector-python.
Create a Python file, , with the following code:
This script:
Waits for the MySQL database to be ready.
Scrapes quotes from the Quotes to Scrape website.
Saves the quotes and authors to the table in the MySQL database.
Step 3: Creating the Dockerfile
Create a to containerize the Python application.
This Dockerfile:
Uses a Python 3.9 base image.
Installs the necessary Python libraries.
Runs the file when the container starts.
Step 4: Building and Running the Application
Now, build and run the application using Docker File.
This command will:
Build the Docker image for the Python application.
After that run the image:
This command will:
Start the MySQL database container.
Run the scraper, which scrapes data and saves it to the database.
Step 5: Verifying the Results
You can connect to the MySQL database to verify the scraped data.
Run the following query:
You should see the scraped quotes stored in the table.
Conclusion
By combining Docker, Python, and MySQL, you’ve created a scalable and portable web scraper application. This setup ensures that your application runs consistently across different environments, making it ideal for deployment. You can extend this by exploring advanced topics like scaling the scraper, integrating APIs, or visualizing the data.
Happy coding!
Aspiring DevOps Engineer | Cloud-Ready & CI/CD Focused | Delivering Scalable Infra & Cost-Efficient Solutions | Open to DevOps Opportunities
5moVery helpful