SlideShare a Scribd company logo
An Introduction To Software
Development Using Python
Spring Semester, 2015
Class #23:
Working With Data
Data Formatting
• In the real world, data comes in many different
shapes, sizes, and encodings.
• This means that you have to know how to
manipulate and transform it into a common format
that will permit efficient processing, sorting, and
storage.
• Python has the tools that will allow you to do all of
this…
Image Credit: publicdomainvectors.org
Your Programming Challenge
• The Florida Polytechnic track team has just been
formed.
• The coach really wants the team to win the state
competition in its first year.
• He’s been recording their training results from the
600m run.
• Now he wants to know the top three fastest times
for each team member.
Image Credit: animals.phillipmartin.info
Here’s What The Data
Looks Like
• James
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
• Julie
2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
• Mike
2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
• Sara
2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55
Image Credit: www.dreamstime.com
1st Step: We Need To Get The Data
• Let’s begin by reading the data from each of
the files into its own list.
• Write a short program to process each file,
creating a list for each athlete’s data, and
display the lists on screen.
• Hint: Try splitting the data on the commas,
and don’t forget to strip any unwanted
whitespace.
1 Image Credit: www.clipartillustration.com
New Python Ideas
• data.strip().split(',')
This is called method chaining.
The first method, strip() , is applied to the line in data, which
removes any unwanted whitespace from the string.
Then, the results of the stripping are processed by the second
method, split(',') , creating a list.
The resulting list is then saved in the variable. In this way, the
methods are chained together to produce the required result.
It helps if you read method chains from left to right.
Image Credit: www.clipartpanda.com
Time To Do Some Sorting
• In-Place Sorting
– Takes your data, arranges it in the order you specify, and then replaces your
original data with the sorted version.
– The original ordering is lost. With lists, the sort() method provides in-place
sorting
– Example - original list: [1,3,4,6,2,5]
list after sorting: [1,2,3,4,5,6]
• Copy Sorting
– Takes your data, arranges it in the order you specify, and then returns a sorted
copy of your original data.
– Your original data’s ordering is maintained and only the copy is sorted. In
Python, the sorted() method supports copied sorting.
– Example - original list: [1,3,4,6,2,5]
list after sorting: [1,3,4,6,2,5]
new list: [1,2,3,4,5,6]
2 Image Credit: www.picturesof.net
What’s Our Problem?
• “-”, “.”, and “:” all have different ASCII values.
• This means that they are screwing up our
sort.
• Sara’s data:
['2:58', '2.58', '2:39’, '2-25', '2-55', '2:54’, '2.18', '2:55', '2:55']
• Python sorts the strings, and when it comes
to strings, a dash comes before a period,
which itself comes before a colon.
• Nonuniformity in the coach’s data is causing
the sort to fail.
Fixing The Coach’s Mistakes
• Let’s create a function called sanitize() , which
takes as input a string from each of the
athlete’s lists.
• The function then processes the string to
replace any dashes or colons found with a
period and returns the sanitized string.
• Note: if the string already contains a
period, there’s no need to sanitize it.
3 Image Credit: www.dreamstime.com
Code Problem: Lots and Lots of
Duplication
• Your code creates four lists to hold the data as read
from the data files.
• Then your code creates another four lists to hold the
sanitized data.
• And, of course, you’re stepping through lists all over
the place…
• There has to be a better way to write code like this.
Image Credit: www.canstockphoto.com
Transforming Lists
• Transforming lists is such a common requirement
that Python provides a tool to make the
transformation as painless as possible.
• This tool goes by the rather unwieldy name of
list comprehension.
• List comprehensions are designed to reduce the
amount of code you need to write when
transforming one list into another.
Image Credit: www.fotosearch.com
Steps In Transforming A List
• Consider what you need to do when you transform one list
into another. Four things have to happen. You need to:
1. Create a new list to hold the transformed data.
2. Iterate each data item in the original list.
3. With each iteration, perform the transformation.
4. Append the transformed data to the new list.
clean_sarah = []
for runTime in sarah:
clean_sarah.append(sanitize(runTime))
❶
❷ ❸
❹
Image Credit: www.cakechooser.com
List Comprehension
• Here’s the same functionality as a list comprehension, which
involves creating a new list by specifying the transformation
that is to be applied to each of the data items within an
existing list.
clean_sarah = [sanitize(runTime) for runTime in sarah]
Create new list
… by applying
a transformation
… to each
data item
… within an
existing list
Note: that the transformation has been reduced to a single line
of code. Additionally, there’s no need to specify the use of the append()
method as this action is implied within the list comprehension
4 Image Credit: www.clipartpanda.com
Congratulations!
• You’ve written a program that reads the
Coach’s data from his data files, stores his raw
data in lists, sanitizes the data to a uniform
format, and then sorts and displays the
coach’s data on screen. And all in ~25 lines of
code.
• It’s probably safe to show
the coach your output now.
Image Credit: vector-magz.com
Ooops – Forgot Why We Were
Doing All Of This: Top 3 Times
• We forgot to worry about what we were
actually supposed to be doing: producing the
three fastest times for each athlete.
• Oh, of course, there’s no place for any
duplicated times in our output.
Image Credit: www.clipartpanda.com
Two Ways To Access The
Time Values That We Want
• Standard Notation
– Specify each list item individually
• sara[0]
• sara[1]
• sara[2]
• List Slice
– sara[0:3]
– Access list items up to, but not including, item 3.
Image Credit: www.canstockphoto.com
The Problem With Duplicates
• Do we have a duplicate problem?
• Processing a list to remove duplicates is one area where a list
comprehension can’t help you, because duplicate removal is not a
transformation; it’s more of a filter.
• And a duplicate removal filter needs to examine the list being created as it
is being created, which is not possible with a list comprehension.
• To meet this new requirement, you’ll need to revert to regular list iteration
code.
James
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
5 Image Credit: www.mycutegraphics.com
Remove Duplicates With Sets
• The overriding characteristics of sets in Python are that the data items in a
set are unordered and duplicates are not allowed.
• If you try to add a data item to a set that already contains the data item,
Python simply ignores it.
• It is also possible to create and populate a set in one step. You can provide
a list of data values between curly braces or specify an existing list as an
argument to the set()
• Any duplicates in the james list will be ignored:
distances = set(james)
distances = {10.6, 11, 8, 10.6, "two", 7}
Duplicates will be ignored
Image Credit: www.pinterest.com
What Do We Do Now?
• To extract the data you need, replace all of
that list iteration code in your current program
with four calls to:
sorted(set(...))[0:3]
6 Image Credit: www.fotosearch.com
What’s In Your Python Toolbox?
print() math strings I/O IF/Else elif While For
DictionaryLists And/Or/Not Functions Files ExceptionSets
What We Covered Today
1. Read in data
2. Sorted it
3. Fixed coach’s mistakes
4. Transformed the list
5. Used List Comprehension
6. Used sets to get rid of
duplicates
Image Credit: http://www.tswdj.com/blog/2011/05/17/the-grooms-checklist/
What We’ll Be Covering Next Time
1. External Libraries
2. Data wrangling
Image Credit: http://merchantblog.thefind.com/2011/01/merchant-newsletter/resolve-to-take-advantage-of-these-5-e-commerce-trends/attachment/crystal-ball-fullsize/

More Related Content

PPTX
An Introduction To Python - Lists, Part 1
PPT
Scalable Data Analysis in R -- Lee Edlefsen
PPTX
PDF
Introduction to data analysis using R
PPSX
Algorithm and Programming (Searching)
PPTX
Data Structure
An Introduction To Python - Lists, Part 1
Scalable Data Analysis in R -- Lee Edlefsen
Introduction to data analysis using R
Algorithm and Programming (Searching)
Data Structure

What's hot (19)

PDF
Text Analysis: Latent Topics and Annotated Documents
PDF
R - the language
PPTX
R programming Fundamentals
PPT
Introduction of data structure
PPTX
R Programming Language
PPTX
Searching Techniques and Analysis
ODP
Introduction to the language R
PPTX
Data structure
PDF
R programming & Machine Learning
KEY
Presentation R basic teaching module
PPTX
Datastructures using c++
PDF
ICDE 2015 - LDV: Light-weight Database Virtualization
PDF
Data structure lecture 2 (pdf)
PDF
Data Wrangling and Visualization Using Python
PPTX
DOC
Data structure lecture 2
PDF
Tree representation in map reduce world
PDF
RDataMining slides-text-mining-with-r
PDF
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
Text Analysis: Latent Topics and Annotated Documents
R - the language
R programming Fundamentals
Introduction of data structure
R Programming Language
Searching Techniques and Analysis
Introduction to the language R
Data structure
R programming & Machine Learning
Presentation R basic teaching module
Datastructures using c++
ICDE 2015 - LDV: Light-weight Database Virtualization
Data structure lecture 2 (pdf)
Data Wrangling and Visualization Using Python
Data structure lecture 2
Tree representation in map reduce world
RDataMining slides-text-mining-with-r
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
Ad

Viewers also liked (12)

DOCX
Bubble sorting lab manual
PDF
Matlab
DOCX
PANCASILA (makalah falsafah pancasila)
PDF
Bin Sorting And Bubble Sort By Luisito G. Trinidad
PDF
Sorting bubble-sort anim
PPSX
Matlab basic and image
PDF
PKI dan Kekejaman Terhadap Ulama
PDF
PPT
Matlab Basic Tutorial
PPTX
Sorting algorithms
PPT
Sorting Algorithms
PDF
How to Become a Thought Leader in Your Niche
Bubble sorting lab manual
Matlab
PANCASILA (makalah falsafah pancasila)
Bin Sorting And Bubble Sort By Luisito G. Trinidad
Sorting bubble-sort anim
Matlab basic and image
PKI dan Kekejaman Terhadap Ulama
Matlab Basic Tutorial
Sorting algorithms
Sorting Algorithms
How to Become a Thought Leader in Your Niche
Ad

Similar to An Introduction To Python - Working With Data (20)

PDF
advanced python for those who have beginner level experience with python
PDF
Advanced Python after beginner python for intermediate learners
PPTX
Lists.pptx
PDF
"Automata Basics and Python Applications"
PPTX
An Introduction To Python - Lists, Part 2
PPTX
python lists with examples and explanation python lists with examples and ex...
PDF
GE3151_PSPP_UNIT_4_Notes
PPTX
Pythonlearn-08-Lists.pptx
PPTX
updated_list.pptx
PPTX
Python _dataStructures_ List, Tuples, its functions
PPTX
List and Dictionary in python
PPTX
An Introduction To Python - Tables, List Algorithms
PPTX
Pythonlearn-08-Lists (1).pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
PDF
justbasics.pdf
PPTX
Python for the data science most in cse.pptx
PDF
Processing data with Python, using standard library modules you (probably) ne...
PPTX
11 Introduction to lists.pptx
PPTX
UNIT-3 python and data structure alo.pptx
PPTX
Brixon Library Technology Initiative
advanced python for those who have beginner level experience with python
Advanced Python after beginner python for intermediate learners
Lists.pptx
"Automata Basics and Python Applications"
An Introduction To Python - Lists, Part 2
python lists with examples and explanation python lists with examples and ex...
GE3151_PSPP_UNIT_4_Notes
Pythonlearn-08-Lists.pptx
updated_list.pptx
Python _dataStructures_ List, Tuples, its functions
List and Dictionary in python
An Introduction To Python - Tables, List Algorithms
Pythonlearn-08-Lists (1).pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
justbasics.pdf
Python for the data science most in cse.pptx
Processing data with Python, using standard library modules you (probably) ne...
11 Introduction to lists.pptx
UNIT-3 python and data structure alo.pptx
Brixon Library Technology Initiative

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Basic Mud Logging Guide for educational purpose
PDF
Business Ethics Teaching Materials for college
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Pre independence Education in Inndia.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
TR - Agricultural Crops Production NC III.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Microbial disease of the cardiovascular and lymphatic systems
Basic Mud Logging Guide for educational purpose
Business Ethics Teaching Materials for college
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPH.pptx obstetrics and gynecology in nursing
human mycosis Human fungal infections are called human mycosis..pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
Pre independence Education in Inndia.pdf
Classroom Observation Tools for Teachers
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
TR - Agricultural Crops Production NC III.pdf

An Introduction To Python - Working With Data

  • 1. An Introduction To Software Development Using Python Spring Semester, 2015 Class #23: Working With Data
  • 2. Data Formatting • In the real world, data comes in many different shapes, sizes, and encodings. • This means that you have to know how to manipulate and transform it into a common format that will permit efficient processing, sorting, and storage. • Python has the tools that will allow you to do all of this… Image Credit: publicdomainvectors.org
  • 3. Your Programming Challenge • The Florida Polytechnic track team has just been formed. • The coach really wants the team to win the state competition in its first year. • He’s been recording their training results from the 600m run. • Now he wants to know the top three fastest times for each team member. Image Credit: animals.phillipmartin.info
  • 4. Here’s What The Data Looks Like • James 2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22 • Julie 2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21 • Mike 2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38 • Sara 2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55 Image Credit: www.dreamstime.com
  • 5. 1st Step: We Need To Get The Data • Let’s begin by reading the data from each of the files into its own list. • Write a short program to process each file, creating a list for each athlete’s data, and display the lists on screen. • Hint: Try splitting the data on the commas, and don’t forget to strip any unwanted whitespace. 1 Image Credit: www.clipartillustration.com
  • 6. New Python Ideas • data.strip().split(',') This is called method chaining. The first method, strip() , is applied to the line in data, which removes any unwanted whitespace from the string. Then, the results of the stripping are processed by the second method, split(',') , creating a list. The resulting list is then saved in the variable. In this way, the methods are chained together to produce the required result. It helps if you read method chains from left to right. Image Credit: www.clipartpanda.com
  • 7. Time To Do Some Sorting • In-Place Sorting – Takes your data, arranges it in the order you specify, and then replaces your original data with the sorted version. – The original ordering is lost. With lists, the sort() method provides in-place sorting – Example - original list: [1,3,4,6,2,5] list after sorting: [1,2,3,4,5,6] • Copy Sorting – Takes your data, arranges it in the order you specify, and then returns a sorted copy of your original data. – Your original data’s ordering is maintained and only the copy is sorted. In Python, the sorted() method supports copied sorting. – Example - original list: [1,3,4,6,2,5] list after sorting: [1,3,4,6,2,5] new list: [1,2,3,4,5,6] 2 Image Credit: www.picturesof.net
  • 8. What’s Our Problem? • “-”, “.”, and “:” all have different ASCII values. • This means that they are screwing up our sort. • Sara’s data: ['2:58', '2.58', '2:39’, '2-25', '2-55', '2:54’, '2.18', '2:55', '2:55'] • Python sorts the strings, and when it comes to strings, a dash comes before a period, which itself comes before a colon. • Nonuniformity in the coach’s data is causing the sort to fail.
  • 9. Fixing The Coach’s Mistakes • Let’s create a function called sanitize() , which takes as input a string from each of the athlete’s lists. • The function then processes the string to replace any dashes or colons found with a period and returns the sanitized string. • Note: if the string already contains a period, there’s no need to sanitize it. 3 Image Credit: www.dreamstime.com
  • 10. Code Problem: Lots and Lots of Duplication • Your code creates four lists to hold the data as read from the data files. • Then your code creates another four lists to hold the sanitized data. • And, of course, you’re stepping through lists all over the place… • There has to be a better way to write code like this. Image Credit: www.canstockphoto.com
  • 11. Transforming Lists • Transforming lists is such a common requirement that Python provides a tool to make the transformation as painless as possible. • This tool goes by the rather unwieldy name of list comprehension. • List comprehensions are designed to reduce the amount of code you need to write when transforming one list into another. Image Credit: www.fotosearch.com
  • 12. Steps In Transforming A List • Consider what you need to do when you transform one list into another. Four things have to happen. You need to: 1. Create a new list to hold the transformed data. 2. Iterate each data item in the original list. 3. With each iteration, perform the transformation. 4. Append the transformed data to the new list. clean_sarah = [] for runTime in sarah: clean_sarah.append(sanitize(runTime)) ❶ ❷ ❸ ❹ Image Credit: www.cakechooser.com
  • 13. List Comprehension • Here’s the same functionality as a list comprehension, which involves creating a new list by specifying the transformation that is to be applied to each of the data items within an existing list. clean_sarah = [sanitize(runTime) for runTime in sarah] Create new list … by applying a transformation … to each data item … within an existing list Note: that the transformation has been reduced to a single line of code. Additionally, there’s no need to specify the use of the append() method as this action is implied within the list comprehension 4 Image Credit: www.clipartpanda.com
  • 14. Congratulations! • You’ve written a program that reads the Coach’s data from his data files, stores his raw data in lists, sanitizes the data to a uniform format, and then sorts and displays the coach’s data on screen. And all in ~25 lines of code. • It’s probably safe to show the coach your output now. Image Credit: vector-magz.com
  • 15. Ooops – Forgot Why We Were Doing All Of This: Top 3 Times • We forgot to worry about what we were actually supposed to be doing: producing the three fastest times for each athlete. • Oh, of course, there’s no place for any duplicated times in our output. Image Credit: www.clipartpanda.com
  • 16. Two Ways To Access The Time Values That We Want • Standard Notation – Specify each list item individually • sara[0] • sara[1] • sara[2] • List Slice – sara[0:3] – Access list items up to, but not including, item 3. Image Credit: www.canstockphoto.com
  • 17. The Problem With Duplicates • Do we have a duplicate problem? • Processing a list to remove duplicates is one area where a list comprehension can’t help you, because duplicate removal is not a transformation; it’s more of a filter. • And a duplicate removal filter needs to examine the list being created as it is being created, which is not possible with a list comprehension. • To meet this new requirement, you’ll need to revert to regular list iteration code. James 2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22 5 Image Credit: www.mycutegraphics.com
  • 18. Remove Duplicates With Sets • The overriding characteristics of sets in Python are that the data items in a set are unordered and duplicates are not allowed. • If you try to add a data item to a set that already contains the data item, Python simply ignores it. • It is also possible to create and populate a set in one step. You can provide a list of data values between curly braces or specify an existing list as an argument to the set() • Any duplicates in the james list will be ignored: distances = set(james) distances = {10.6, 11, 8, 10.6, "two", 7} Duplicates will be ignored Image Credit: www.pinterest.com
  • 19. What Do We Do Now? • To extract the data you need, replace all of that list iteration code in your current program with four calls to: sorted(set(...))[0:3] 6 Image Credit: www.fotosearch.com
  • 20. What’s In Your Python Toolbox? print() math strings I/O IF/Else elif While For DictionaryLists And/Or/Not Functions Files ExceptionSets
  • 21. What We Covered Today 1. Read in data 2. Sorted it 3. Fixed coach’s mistakes 4. Transformed the list 5. Used List Comprehension 6. Used sets to get rid of duplicates Image Credit: http://www.tswdj.com/blog/2011/05/17/the-grooms-checklist/
  • 22. What We’ll Be Covering Next Time 1. External Libraries 2. Data wrangling Image Credit: http://merchantblog.thefind.com/2011/01/merchant-newsletter/resolve-to-take-advantage-of-these-5-e-commerce-trends/attachment/crystal-ball-fullsize/

Editor's Notes

  • #2: New name for the class I know what this means Technical professionals are who get hired This means much more than just having a narrow vertical knowledge of some subject area. It means that you know how to produce an outcome that I value. I’m willing to pay you to do that.