"Dockerized Web Scraping: Using Python and MySQL to Extract and Store Data"

Himanshu Singh

DevOps Enthusiast

Published Feb 14, 2025

In the ever-evolving world of data-driven applications, web scraping is a vital tool for extracting and analyzing information from websites. Integrating Docker, Python, and MySQL provides an efficient, scalable, and portable setup for creating and deploying web scraper applications. This blog will guide you through building a web scraper application using these technologies.

Why Use Docker, Python, and MySQL?

Docker: Ensures your application runs in isolated environments, making it easy to manage dependencies and scale.
Python: Offers powerful libraries like BeautifulSoup and Requests for web scraping.
MySQL: A robust database system to store and manage scraped data.

By combining these tools, you can build a portable and scalable application that performs seamlessly across various environments.

Project Overview

We will:

Set up the MySQL database container and initialize the database.
Create a Python web scraper to extract data from a website.
Attach the Python application to the MySQL database.
Build a Docker image for the Python application.
Use the image to scrape data and store it in the MySQL database.

Step 1: Setting Up the MySQL Database Container

The first step is to set up the MySQL container. We'll use the command to create a MySQL container and configure the database. We’ll also initialize the database with a table to store the scraped data.

Run the following command to start the MySQL container:

The command runs a Docker container with the following settings:

: Run the container in detached mode, interactively, and allocate a terminal.
: Assign the name "mydb" to the container.
: Set the root password for MySQL to "redhat".
: Create a database named "scraper_db" in MySQL.
: Map the container's port 3306 (MySQL default) to the host machine's port 3306.
: Use the latest MySQL image from Docker Hub.

Next, connect to the MySQL container to create the necessary table:

Once inside the MySQL shell, run the following SQL commands to set up the table:

This will create a table named to store the scraped data (quote text and author).

Step 2: Creating the Python Web Scraper

Now that the database is ready, let's create the Python application that will scrape data from the website.

Create a directory for your project:

Install necessary Python libraries:

create a requirement txt file and write all the libraries you want to download for this application . After that we can just run this command it will download all the mentioned libraries of the file.

Python packages: requests, BeautifulSoup, mysql-connector-python.

Create a Python file, , with the following code:

This script:

Waits for the MySQL database to be ready.
Scrapes quotes from the Quotes to Scrape website.
Saves the quotes and authors to the table in the MySQL database.

Step 3: Creating the Dockerfile

Create a to containerize the Python application.

This Dockerfile:

Uses a Python 3.9 base image.
Installs the necessary Python libraries.
Runs the file when the container starts.

Step 4: Building and Running the Application

Now, build and run the application using Docker File.

This command will:

Build the Docker image for the Python application.

After that run the image:

This command will:

Start the MySQL database container.
Run the scraper, which scrapes data and saves it to the database.

Step 5: Verifying the Results

You can connect to the MySQL database to verify the scraped data.

Run the following query:

You should see the scraped quotes stored in the table.

Conclusion

By combining Docker, Python, and MySQL, you’ve created a scalable and portable web scraper application. This setup ensures that your application runs consistently across different environments, making it ideal for deployment. You can extend this by exploring advanced topics like scaling the scraper, integrating APIs, or visualizing the data.

Happy coding!

"Dockerized Web Scraping: Using Python and MySQL to Extract and Store Data"

Himanshu Singh

DevOps Enthusiast

Why Use Docker, Python, and MySQL?

Project Overview

Step 1: Setting Up the MySQL Database Container

Step 2: Creating the Python Web Scraper

Step 3: Creating the Dockerfile

Step 4: Building and Running the Application

Step 5: Verifying the Results

Conclusion

More articles by this author

Others also viewed

SQL in Django

Mastering SQLAlchemy with FastAPI

Bulk Insert via python to insert over 4 Million+ rows to MariaDB at localhost [Project-Based]

Part III: Python and the Oracle Autonomous Database - Three Ways to Connect

🚀 New in Timeplus Enterprise v2.7: Built-in support for MySQL and S3, Dictionaries, Python UDFs, and more

🌟 Day 14: My Spring Boot Learning Journey 🌟

50 Laravel Eloquent ORM Functions

Python Database Connection Tutorial

🚀 Supercharge Your Unit Testing with FastAPI, PostgreSQL & TestContainers 🚀

esProc SPL vs DuckDB: Which is more Lightweight for In-Application Computation

Explore topics

Why Use Docker, Python, and MySQL?

Project Overview

Step 1: Setting Up the MySQL Database Container

Step 2: Creating the Python Web Scraper

Step 3: Creating the Dockerfile

Step 4: Building and Running the Application

Step 5: Verifying the Results

Conclusion

Building and Scaling an Online Polling Application with Docker and Kubernetes

May 6, 2025

Case Study: How Hotstar Utilizes Kubernetes for Seamless Streaming

Feb 10, 2025

How to Create a Custom Image in Docker

Feb 3, 2025

🚀 How Docker Saves Time and Money for Companies 🐳

Jan 29, 2025

Mastering Docker Networking: NAT, DNAT, iptables, and Beyond

Dec 28, 2024

Understanding the Linux Boot Process and Docker Networking: A Deep Dive

Dec 27, 2024

Connecting Frontend, Backend, and Database in Docker

Dec 26, 2024

Docker Containerization for Web Apps: Flask and Linux Networking Essentials

Dec 24, 2024

Introduction to Docker: The Game-Changer in Application Deployment

Dec 23, 2024

Mastering Shell Scripting: A Guide to Automating Tasks

Dec 22, 2024

Others also viewed

SQL in Django

Mastering SQLAlchemy with FastAPI

Bulk Insert via python to insert over 4 Million+ rows to MariaDB at localhost [Project-Based]

Part III: Python and the Oracle Autonomous Database - Three Ways to Connect

🚀 New in Timeplus Enterprise v2.7: Built-in support for MySQL and S3, Dictionaries, Python UDFs, and more

🌟 Day 14: My Spring Boot Learning Journey 🌟

50 Laravel Eloquent ORM Functions

Python Database Connection Tutorial

🚀 Supercharge Your Unit Testing with FastAPI, PostgreSQL & TestContainers 🚀

esProc SPL vs DuckDB: Which is more Lightweight for In-Application Computation

Explore topics