SlideShare a Scribd company logo
TOOLS OF THE TRADE:
Automated Database Sanitization with AWS
Dee Wilcox
Nashville PHP Monthly Meetup
April 11, 2017
About Me
● Senior Software Developer at NASBA.
● Not a Nashville native, but it’s been home for 7 years.
● Married for 14 years, 2 daughters ages 6 and 1.
● Teaching myself to code since 2002.
● Passionate about the maker movement, mentoring and
empowering women in tech, and building healthy teams.
Full Stack Dev Tools of the Trade
Webserver
● Linux
● Apache
● Easy to create
development
environments with
AWS EC2s or
Lightsail, Docker,
and a host of other
providers.
Database
● MySQL or Postgres
○ Integrated
LAMP
webserver
● Dedicated
database server
● AWS RDS
instance
Application Layer
● Application code
runs on the
webserver and
connects out to the
database
● Environments are
managed through
application
configuration files
The Problem:
How do we cleanly reload
production data to a
development or testing
environment without
compromising security?
Automated Database Sanitization
Option 1:
Web Based On
Demand Solution
One option is to create a simple web
application that is designed to
retrieve, sanitize, and store sanitized
MySQL dump files so that they are
easily accessible.
Benefits of an On Demand Solution
Easy Maintenance
● PHP application code is easy to maintain
● Development team can modify and improve
Easy on the Database
● Dump files are only created as needed
● Better for storage management
Easy Tracking
● Easy tracking for recent requests
● Helps eliminate duplicate requests
Easy Storage and Retrieval
● A common storage and retrieval mechanism
streamlines processes for Ops & Development
In Practice
Failing Un-Gracefully
● Failures to find or execute the
sanitization routines were not captured
or returned to the user, causing data
dumps to either not be created, or to
be created while still containing
sensitive data.
Room for improvement:
● Capturing all types of MySQL errors
● Logging and notification controls for
successful and unsuccessful processing.
Too Tightly Coupled
● Tightly coupling the sanitization code
from the data layer with the
application code made it difficult to
maintain a separation of concerns.
● In an environment with separated
development, operations, and database
administration roles, this made the
process more cumbersome.
Room for improvement:
● Separate application and data layers.
Option 2:
Automated Solution
Another option is to create a simple
shell script (or two) that runs nightly
on a utility server with read-only
access to the production database (or
production replica).
Benefits of an Automated Solution
Easier Maintenance
● Bash code is easy to read and maintain for
both Operations and Development
Easier on the Database
● Dump files are created when the databases are
least-used
Easier Tracking
● No request management. Dump files are
delivered for all databases nightly
Easy Storage and Retrieval
● A common storage and retrieval mechanism
streamlines processes for Ops & Development
In Practice
Boring Bash
● Bash shell scripting has its place, but
it’s not necessarily a beloved
programming language in the PHP
development community.
● Luckily, it’s still easy to work with.
Room for improvement:
● The bash script is long and procedural.
It could be organized into methods,
which would be easier for most PHP
developers to follow.
Tightly Defined Sanitization
Parameters
● The current sanitization scripts
perform standardized sanitization on
table columns known to contain
sensitive data.
● They do not scan the data tables
utilizing regex to identify or mask
additional PII.
Room for improvement:
● Smarter sanitization logic.
Webserver Database Storage & Retrieval
Leveraging AWS
LightSail or EC2 MySQL RDS
Utility Scripts
● Automated sanitization and storage
● Automated retrieval
MySQL Replica
S3 and Glacier
Step 1:
Set up the
Environment
Key dependencies:
● Linux webserver
● AWS Credentials for S3
● AWS CLI
● Database credentials
○ Username and
password OR
○ Login path
● Repository for
sanitizer_db
Setting up Sanitizer DB
Create Statements
● Define the databases that need to be
sanitized.
● Include specific and accurate create
statements that match the production
configuration for these databases.
Grants and Definers
● Make sure your new database user has
read-only access to the other databases
and write access to create and drop
new databases.
Sanitization Routines
● Clearly define the data to be sanitized.
● Use queries or stored routines -
whatever fits your environment best.
Step 2:
Set up the
Database
# Create databases and definers
mysql --login-path=local <
/data/sites/sanitizer_db/databases/creat
e.sql
mysql --login-path=local <
/data/sites/sanitizer_db/databases/defin
ers.sql
# Loop through sanitization routines
cd /data/sites/sanitizer_db/routines
for routine in sanitize*.sql
do
routine_name=$(echo $routine | sed
's/.sql//')
Step 3:
Compile the
Sanitization
Routines and Empty
Databases
database=$(echo $routine | sed 's/.sql//g
; s/^sanitize_// ; s/_noop//')
database_filename=$database$filename
# Drop the database in sanitizer if it
already exists
mysql --login-path=local -e "drop database
if exists sanitize_$database;"
# Create a database
mysql --login-path=local -e "create
database sanitize_$database default
character set utf8;"
# Compile the stored routine for
sanitization
mysql --login-path=local < $routine
Step 4:
Load and Sanitize
# Generate dump files of each database
mysqldump --login-path=local
--lock-tables=false $database | mysql
--login-path=local sanitize_$database
# Run sanitization and capture output
sanitized=$(echo "call
sanitize_$database.sanitize_$database(1)
;" | mysql --login-path=local)
Step 5:
Catch Errors
if [ "$?" == 0]
then
echo "There was a problem executing
the stored routine."
fi
if [ -z "$sanitized" ]
then
sanitized_fail+="$database "
fi
Step 6:
Dump and
Compress Sanitized
Data
if [ "$sanitized" ]
then
# Add entries to sanitized success array.
sanitized_success+="$database "
# Remove existing sanitized file
rm -f
"$local_directory"/"$database_filename"
# Create compressed mysqldump file
mysqldump --login-path=local
--lock-tables=false --no-create-info
--skip-triggers $database | bzip2 >
"$local_directory"/"$database_filename"
# Send to S3
/usr/local/bin/aws --profile $s3_profile
s3 mv
"$local_directory"/"$database_filename"
"$s3_url"/"$database_filename" --region
$s3_region
fi
Step 7:
Clean Up the
Environment
# Drop the sanitized database
mysql --login-path=local -e
"drop database sanitize_$database;"
# Remove the SQL file if it
still exists
rm -f "$database"_sanitized.sql
done
Storage and Retrieval in S3
Using the AWS CLI
● Credentials must be defined and exist in
~/.aws/config
● Include parameters for region and profile
● Encryption flag is only needed on
retrieval of the file
Using a Scheduler
● Utilized simple crontab functionality to
create a scheduled job.
● Use AWS Lambda to schedule events.
Audit Controls
● Use the $sanitized_fail and
$sanitized_success arrays to track
successes and failures.
● Make use of logging and notifications
to meet audit requirements and
immediately notify users of any issues.
Let’s Discuss!
What have you tried in
your environment?
What is working?
Not working?
Where to Find Me
Twitter
https://twitter.com/dee_wilcox
LinkedIn
https://www.linkedin.com/in/
deewilcox
Google
https://plus.google.com/+Dee
WilcoxOnline
Github
https://github..com/deewilcox

More Related Content

PPTX
3rd Mga Epekto sa Kapaligiran, Lipunan, at Kabuhayan ng Climate Change.pptx
PPT
Kabanata 12-13
PPTX
Araling panlipunan
PPTX
Kabanata 35-36
PPT
makroekonomiksatangpaikotnadaloyngekonomiya-160211053143.ppt
PPTX
Pump It! Presentation
PPTX
GR.10.PPT1.pptx
PPT
Ebolusyong Kultural (Converted)
3rd Mga Epekto sa Kapaligiran, Lipunan, at Kabuhayan ng Climate Change.pptx
Kabanata 12-13
Araling panlipunan
Kabanata 35-36
makroekonomiksatangpaikotnadaloyngekonomiya-160211053143.ppt
Pump It! Presentation
GR.10.PPT1.pptx
Ebolusyong Kultural (Converted)

Similar to Automated Database Sanitization with AWS (20)

PDF
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
PDF
An Express Guide ~ Zabbix for IT Monitoring
PDF
An Introduction To Palomino
PPTX
PHP Database Programming Basics -- Northeast PHP
PPTX
McAfee Open Source Insight - Aharon Robbins - OpenStack Day Israel 2017
PDF
php_mysql_tutorial
PDF
php_mysql_tutorial
PPTX
Moving to the Cloud: AWS, Zend, RightScale
PDF
Data Security in Fintech App Development: How PHP Can Help
PPT
MySQL Features & Implementation
PDF
Running MySQL in AWS
PPTX
6 simple tips to become a better PHP developer
PDF
Scalable, good, cheap
PDF
Securing data and preventing data breaches
PDF
A foundation for database simplifi cation, consolidation and cost reduction
PDF
MySQL server security
PDF
MariaDB: in-depth (hands on training in Seoul)
PPT
Php My Sql Security 2007
PDF
Successful MySQL Scalability
PPT
PHPUG Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
An Express Guide ~ Zabbix for IT Monitoring
An Introduction To Palomino
PHP Database Programming Basics -- Northeast PHP
McAfee Open Source Insight - Aharon Robbins - OpenStack Day Israel 2017
php_mysql_tutorial
php_mysql_tutorial
Moving to the Cloud: AWS, Zend, RightScale
Data Security in Fintech App Development: How PHP Can Help
MySQL Features & Implementation
Running MySQL in AWS
6 simple tips to become a better PHP developer
Scalable, good, cheap
Securing data and preventing data breaches
A foundation for database simplifi cation, consolidation and cost reduction
MySQL server security
MariaDB: in-depth (hands on training in Seoul)
Php My Sql Security 2007
Successful MySQL Scalability
PHPUG Presentation
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Ad

Automated Database Sanitization with AWS

  • 1. TOOLS OF THE TRADE: Automated Database Sanitization with AWS Dee Wilcox Nashville PHP Monthly Meetup April 11, 2017
  • 2. About Me ● Senior Software Developer at NASBA. ● Not a Nashville native, but it’s been home for 7 years. ● Married for 14 years, 2 daughters ages 6 and 1. ● Teaching myself to code since 2002. ● Passionate about the maker movement, mentoring and empowering women in tech, and building healthy teams.
  • 3. Full Stack Dev Tools of the Trade Webserver ● Linux ● Apache ● Easy to create development environments with AWS EC2s or Lightsail, Docker, and a host of other providers. Database ● MySQL or Postgres ○ Integrated LAMP webserver ● Dedicated database server ● AWS RDS instance Application Layer ● Application code runs on the webserver and connects out to the database ● Environments are managed through application configuration files
  • 4. The Problem: How do we cleanly reload production data to a development or testing environment without compromising security?
  • 6. Option 1: Web Based On Demand Solution One option is to create a simple web application that is designed to retrieve, sanitize, and store sanitized MySQL dump files so that they are easily accessible.
  • 7. Benefits of an On Demand Solution Easy Maintenance ● PHP application code is easy to maintain ● Development team can modify and improve Easy on the Database ● Dump files are only created as needed ● Better for storage management Easy Tracking ● Easy tracking for recent requests ● Helps eliminate duplicate requests Easy Storage and Retrieval ● A common storage and retrieval mechanism streamlines processes for Ops & Development
  • 8. In Practice Failing Un-Gracefully ● Failures to find or execute the sanitization routines were not captured or returned to the user, causing data dumps to either not be created, or to be created while still containing sensitive data. Room for improvement: ● Capturing all types of MySQL errors ● Logging and notification controls for successful and unsuccessful processing. Too Tightly Coupled ● Tightly coupling the sanitization code from the data layer with the application code made it difficult to maintain a separation of concerns. ● In an environment with separated development, operations, and database administration roles, this made the process more cumbersome. Room for improvement: ● Separate application and data layers.
  • 9. Option 2: Automated Solution Another option is to create a simple shell script (or two) that runs nightly on a utility server with read-only access to the production database (or production replica).
  • 10. Benefits of an Automated Solution Easier Maintenance ● Bash code is easy to read and maintain for both Operations and Development Easier on the Database ● Dump files are created when the databases are least-used Easier Tracking ● No request management. Dump files are delivered for all databases nightly Easy Storage and Retrieval ● A common storage and retrieval mechanism streamlines processes for Ops & Development
  • 11. In Practice Boring Bash ● Bash shell scripting has its place, but it’s not necessarily a beloved programming language in the PHP development community. ● Luckily, it’s still easy to work with. Room for improvement: ● The bash script is long and procedural. It could be organized into methods, which would be easier for most PHP developers to follow. Tightly Defined Sanitization Parameters ● The current sanitization scripts perform standardized sanitization on table columns known to contain sensitive data. ● They do not scan the data tables utilizing regex to identify or mask additional PII. Room for improvement: ● Smarter sanitization logic.
  • 12. Webserver Database Storage & Retrieval Leveraging AWS LightSail or EC2 MySQL RDS Utility Scripts ● Automated sanitization and storage ● Automated retrieval MySQL Replica S3 and Glacier
  • 13. Step 1: Set up the Environment Key dependencies: ● Linux webserver ● AWS Credentials for S3 ● AWS CLI ● Database credentials ○ Username and password OR ○ Login path ● Repository for sanitizer_db
  • 14. Setting up Sanitizer DB Create Statements ● Define the databases that need to be sanitized. ● Include specific and accurate create statements that match the production configuration for these databases. Grants and Definers ● Make sure your new database user has read-only access to the other databases and write access to create and drop new databases. Sanitization Routines ● Clearly define the data to be sanitized. ● Use queries or stored routines - whatever fits your environment best.
  • 15. Step 2: Set up the Database # Create databases and definers mysql --login-path=local < /data/sites/sanitizer_db/databases/creat e.sql mysql --login-path=local < /data/sites/sanitizer_db/databases/defin ers.sql # Loop through sanitization routines cd /data/sites/sanitizer_db/routines for routine in sanitize*.sql do routine_name=$(echo $routine | sed 's/.sql//')
  • 16. Step 3: Compile the Sanitization Routines and Empty Databases database=$(echo $routine | sed 's/.sql//g ; s/^sanitize_// ; s/_noop//') database_filename=$database$filename # Drop the database in sanitizer if it already exists mysql --login-path=local -e "drop database if exists sanitize_$database;" # Create a database mysql --login-path=local -e "create database sanitize_$database default character set utf8;" # Compile the stored routine for sanitization mysql --login-path=local < $routine
  • 17. Step 4: Load and Sanitize # Generate dump files of each database mysqldump --login-path=local --lock-tables=false $database | mysql --login-path=local sanitize_$database # Run sanitization and capture output sanitized=$(echo "call sanitize_$database.sanitize_$database(1) ;" | mysql --login-path=local)
  • 18. Step 5: Catch Errors if [ "$?" == 0] then echo "There was a problem executing the stored routine." fi if [ -z "$sanitized" ] then sanitized_fail+="$database " fi
  • 19. Step 6: Dump and Compress Sanitized Data if [ "$sanitized" ] then # Add entries to sanitized success array. sanitized_success+="$database " # Remove existing sanitized file rm -f "$local_directory"/"$database_filename" # Create compressed mysqldump file mysqldump --login-path=local --lock-tables=false --no-create-info --skip-triggers $database | bzip2 > "$local_directory"/"$database_filename" # Send to S3 /usr/local/bin/aws --profile $s3_profile s3 mv "$local_directory"/"$database_filename" "$s3_url"/"$database_filename" --region $s3_region fi
  • 20. Step 7: Clean Up the Environment # Drop the sanitized database mysql --login-path=local -e "drop database sanitize_$database;" # Remove the SQL file if it still exists rm -f "$database"_sanitized.sql done
  • 21. Storage and Retrieval in S3 Using the AWS CLI ● Credentials must be defined and exist in ~/.aws/config ● Include parameters for region and profile ● Encryption flag is only needed on retrieval of the file Using a Scheduler ● Utilized simple crontab functionality to create a scheduled job. ● Use AWS Lambda to schedule events. Audit Controls ● Use the $sanitized_fail and $sanitized_success arrays to track successes and failures. ● Make use of logging and notifications to meet audit requirements and immediately notify users of any issues.
  • 22. Let’s Discuss! What have you tried in your environment? What is working? Not working?
  • 23. Where to Find Me Twitter https://twitter.com/dee_wilcox LinkedIn https://www.linkedin.com/in/ deewilcox Google https://plus.google.com/+Dee WilcoxOnline Github https://github..com/deewilcox